Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?

Chan, Ricky KW; Wang, Bruce X

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.forsciint.2024.112199
WOS: WOS:001301775800001
Find via

Supplementary

Citations:
- Web of Science: 0
Appears in Collections:
- English: Journal/Magazine Articles

Article: Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?

Title	Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?
Authors	Chan, Ricky KW Wang, Bruce X
Issue Date	24-Aug-2024
Publisher	Elsevier
Citation	Forensic Science International, 2024, v. 363 How to Cite? DOI: http://dx.doi.org/10.1016/j.forsciint.2024.112199
Abstract	A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.
Persistent Identifier	http://hdl.handle.net/10722/345971
ISSN	0379-0738 2023 Impact Factor: 2.2 2023 SCImago Journal Rankings: 0.750
ISI Accession Number ID	WOS:001301775800001

DC Field	Value	Language
dc.contributor.author	Chan, Ricky KW	-
dc.contributor.author	Wang, Bruce X	-
dc.date.accessioned	2024-09-04T07:06:50Z	-
dc.date.available	2024-09-04T07:06:50Z	-
dc.date.issued	2024-08-24	-
dc.identifier.citation	Forensic Science International, 2024, v. 363	-
dc.identifier.issn	0379-0738	-
dc.identifier.uri	http://hdl.handle.net/10722/345971	-
dc.description.abstract	<p>A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.<br></p>	-
dc.language	eng	-
dc.publisher	Elsevier	-
dc.relation.ispartof	Forensic Science International	-
dc.title	Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?	-
dc.type	Article	-
dc.identifier.doi	10.1016/j.forsciint.2024.112199	-
dc.identifier.volume	363	-
dc.identifier.eissn	1872-6283	-
dc.identifier.isi	WOS:001301775800001	-
dc.identifier.issnl	0379-0738	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats