File Download

There are no files associated with this item.

Supplementary

Conference Paper: Attention-LSTM Autoencoder for Phonotactics Learning from Raw Audio Input

TitleAttention-LSTM Autoencoder for Phonotactics Learning from Raw Audio Input
Authors
Issue Date29-Jun-2024
Abstract

Infants develop phonemic awareness by 6 to 8 months and phonotactic knowledge by 8 to 10 months. They have statistical learning capabilities and prefer sequences with higher transitional probabilities. However, it's unclear how these abilities are present in early phonological acquisition. This study investigates the ability of a neural network model to acquire phonotactic knowledge using a raw audio corpus. The model is designed without prior knowledge of phonemes or rules and relies solely on raw audio sequences as input. The study focuses on the aspiration alternation in English voiceless stop consonants occurring after the sibilant fricative /s/. A subset of the LibriSpeech corpus is used, with word-initial voiceless stops and /s/-stop sequences. The data is transformed into Mel-spectrograms, and an autoencoder model is trained to compress and decode the input. Ten models are trained and evaluated, and attention matrices are analyzed to measure the model's focus on different segments. The study finds that the model exhibits sensitivity to contrast points and allocates more attention to the /s/ segment when reconstructing the following plosive. The model also shows the ability to differentiate between stops that follow an /s/ and those that do not. Overall, the study demonstrates how an autoencoder model implicitly learns phonotactic knowledge from raw audio data, resembling early stages of language acquisition.


Persistent Identifierhttp://hdl.handle.net/10722/342847

 

DC FieldValueLanguage
dc.contributor.authorDo, Youngah-
dc.contributor.authorTan, Lihui-
dc.date.accessioned2024-05-02T03:06:19Z-
dc.date.available2024-05-02T03:06:19Z-
dc.date.issued2024-06-29-
dc.identifier.urihttp://hdl.handle.net/10722/342847-
dc.description.abstract<p>Infants develop phonemic awareness by 6 to 8 months and phonotactic knowledge by 8 to 10 months. They have statistical learning capabilities and prefer sequences with higher transitional probabilities. However, it's unclear how these abilities are present in early phonological acquisition. This study investigates the ability of a neural network model to acquire phonotactic knowledge using a raw audio corpus. The model is designed without prior knowledge of phonemes or rules and relies solely on raw audio sequences as input. The study focuses on the aspiration alternation in English voiceless stop consonants occurring after the sibilant fricative /s/. A subset of the LibriSpeech corpus is used, with word-initial voiceless stops and /s/-stop sequences. The data is transformed into Mel-spectrograms, and an autoencoder model is trained to compress and decode the input. Ten models are trained and evaluated, and attention matrices are analyzed to measure the model's focus on different segments. The study finds that the model exhibits sensitivity to contrast points and allocates more attention to the /s/ segment when reconstructing the following plosive. The model also shows the ability to differentiate between stops that follow an /s/ and those that do not. Overall, the study demonstrates how an autoencoder model implicitly learns phonotactic knowledge from raw audio data, resembling early stages of language acquisition.</p>-
dc.languageeng-
dc.relation.ispartofLaboratory Phonology 19 (26/06/2024-29/06/2024, , , Seoul)-
dc.titleAttention-LSTM Autoencoder for Phonotactics Learning from Raw Audio Input-
dc.typeConference_Paper-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats