Attention-LSTM Autoencoder for Phonotactics Learning from Raw Audio Input

Do, Youngah; Tan, Lihui

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Linguistics: Conference papers

Conference Paper: Attention-LSTM Autoencoder for Phonotactics Learning from Raw Audio Input

Title	Attention-LSTM Autoencoder for Phonotactics Learning from Raw Audio Input
Authors	Do, Youngah Tan, Lihui
Issue Date	29-Jun-2024
Abstract	Infants develop phonemic awareness by 6 to 8 months and phonotactic knowledge by 8 to 10 months. They have statistical learning capabilities and prefer sequences with higher transitional probabilities. However, it's unclear how these abilities are present in early phonological acquisition. This study investigates the ability of a neural network model to acquire phonotactic knowledge using a raw audio corpus. The model is designed without prior knowledge of phonemes or rules and relies solely on raw audio sequences as input. The study focuses on the aspiration alternation in English voiceless stop consonants occurring after the sibilant fricative /s/. A subset of the LibriSpeech corpus is used, with word-initial voiceless stops and /s/-stop sequences. The data is transformed into Mel-spectrograms, and an autoencoder model is trained to compress and decode the input. Ten models are trained and evaluated, and attention matrices are analyzed to measure the model's focus on different segments. The study finds that the model exhibits sensitivity to contrast points and allocates more attention to the /s/ segment when reconstructing the following plosive. The model also shows the ability to differentiate between stops that follow an /s/ and those that do not. Overall, the study demonstrates how an autoencoder model implicitly learns phonotactic knowledge from raw audio data, resembling early stages of language acquisition.
Persistent Identifier	http://hdl.handle.net/10722/342847

DC Field	Value	Language
dc.contributor.author	Do, Youngah	-
dc.contributor.author	Tan, Lihui	-
dc.date.accessioned	2024-05-02T03:06:19Z	-
dc.date.available	2024-05-02T03:06:19Z	-
dc.date.issued	2024-06-29	-
dc.identifier.uri	http://hdl.handle.net/10722/342847	-
dc.description.abstract	<p>Infants develop phonemic awareness by 6 to 8 months and phonotactic knowledge by 8 to 10 months. They have statistical learning capabilities and prefer sequences with higher transitional probabilities. However, it's unclear how these abilities are present in early phonological acquisition. This study investigates the ability of a neural network model to acquire phonotactic knowledge using a raw audio corpus. The model is designed without prior knowledge of phonemes or rules and relies solely on raw audio sequences as input. The study focuses on the aspiration alternation in English voiceless stop consonants occurring after the sibilant fricative /s/. A subset of the LibriSpeech corpus is used, with word-initial voiceless stops and /s/-stop sequences. The data is transformed into Mel-spectrograms, and an autoencoder model is trained to compress and decode the input. Ten models are trained and evaluated, and attention matrices are analyzed to measure the model's focus on different segments. The study finds that the model exhibits sensitivity to contrast points and allocates more attention to the /s/ segment when reconstructing the following plosive. The model also shows the ability to differentiate between stops that follow an /s/ and those that do not. Overall, the study demonstrates how an autoencoder model implicitly learns phonotactic knowledge from raw audio data, resembling early stages of language acquisition.</p>	-
dc.language	eng	-
dc.relation.ispartof	Laboratory Phonology 19 (26/06/2024-29/06/2024, , , Seoul)	-
dc.title	Attention-LSTM Autoencoder for Phonotactics Learning from Raw Audio Input	-
dc.type	Conference_Paper	-

File Download

Supplementary

Conference Paper: Attention-LSTM Autoencoder for Phonotactics Learning from Raw Audio Input

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats