File Download
Supplementary

postgraduate thesis: A longitudinal corpus study on the emergence of multiword constructions in Cantonese using usage-based model and information theory

TitleA longitudinal corpus study on the emergence of multiword constructions in Cantonese using usage-based model and information theory
Authors
Advisors
Advisor(s):Lee, TWong, AMY
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Lau, S. H. [劉淑嫺]. (2024). A longitudinal corpus study on the emergence of multiword constructions in Cantonese using usage-based model and information theory. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThis study aimed to examine the emergence of multiword constructions in a typically developing Cantonese-speaking child from 22 to 34 months of age, using usage-based theories of language acquisition and information theory. Data came from one of the eight children in CANCORP hosted in the publicly accessible site CHILDES. The child’s 10,921 utterances were divided into four quarterly corpus parts (CPs) for a two-phase analysis. At analysis phase one, a qualitative stratified approach was developed based on Construction Grammar and Systemic Functional Grammar to analyse changes in the contextual, semantic, and lexico-grammatical features of multiword constructions across CP1 to CP4. At analysis phase two, entropy and mutual information of the information theory were used to quantify developmental changes of multiword constructions across the four CPs. First, entropy derived from the probability distribution of contrastive options in each language feature was used to quantify changes in the productivity, complexity, and diversity of multiword constructions. Then, mutual information, a statistical associative measure derived from the joint probability distribution of a pair of collocated multiword constructions, was used to quantify the mutual dependency of the entrenched and newly emerged multiword constructions in CP3 and CP4 respectively. Findings of the phase one analysis were consistent with a three-stage developmental process postulated by the Construction Grammar. Item-Based Constructions were observed in CP1. They were simple clauses with an event word, which only combined with a limited range of content or function words in obligatory occasions to represent daily experiential scenes (e.g., reading scenes: this – object name). Next, Abstract Constructions were frequently observed in CP2. They were simple clauses combined by two or more groups to represent the participants and their semantic relations in daily events (e.g., [Verbal Process: verbal group] – [Patient: nominal group], and [Carrier: nominal group] – [Attribute: adjectival group]). Lastly, expanded Abstract Constructions were frequently used in CP3 and CP4. They were simple, hypotaxic, and parataxic clauses combined by a diverse range of expanded groups and clause complexes (e.g., [Carrier: expanded nominal group] – [Attribute: hypotaxic clause], and [dependent clause] – [main clause]). Some of the constituents in these constructions were high frequency groups and clauses acquired at a preceding stage. Findings of phase two analysis revealed that the entropies of information structures, clause structures, and group structures of the multiword constructions had an overall increasing trend across the study period, with individual clauses and group structures showing a lower rate of increment or decrement at a specific stage. In addition, the measures of mutual information demonstrated that the co-occurrence of multiword constructions acquired at a preceding stage and multiword constructions emerged at a later stage was significantly higher than chance. The higher the frequency of the multiword constructions, the higher the variability of their collocation patterns at a later stage. The application of a qualitative stratified approach in combination with quantitative analyses using information theory in research on language development is discussed. Implications of theory-based and data driven analyses on clinical assessment and intervention of language disorders are also discussed.
DegreeDoctor of Philosophy
SubjectCantonese dialects - Study and teaching (Early childhood)
Dept/ProgramEducation
Persistent Identifierhttp://hdl.handle.net/10722/344393

 

DC FieldValueLanguage
dc.contributor.advisorLee, T-
dc.contributor.advisorWong, AMY-
dc.contributor.authorLau, Suk Han-
dc.contributor.author劉淑嫺-
dc.date.accessioned2024-07-30T05:00:33Z-
dc.date.available2024-07-30T05:00:33Z-
dc.date.issued2024-
dc.identifier.citationLau, S. H. [劉淑嫺]. (2024). A longitudinal corpus study on the emergence of multiword constructions in Cantonese using usage-based model and information theory. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/344393-
dc.description.abstractThis study aimed to examine the emergence of multiword constructions in a typically developing Cantonese-speaking child from 22 to 34 months of age, using usage-based theories of language acquisition and information theory. Data came from one of the eight children in CANCORP hosted in the publicly accessible site CHILDES. The child’s 10,921 utterances were divided into four quarterly corpus parts (CPs) for a two-phase analysis. At analysis phase one, a qualitative stratified approach was developed based on Construction Grammar and Systemic Functional Grammar to analyse changes in the contextual, semantic, and lexico-grammatical features of multiword constructions across CP1 to CP4. At analysis phase two, entropy and mutual information of the information theory were used to quantify developmental changes of multiword constructions across the four CPs. First, entropy derived from the probability distribution of contrastive options in each language feature was used to quantify changes in the productivity, complexity, and diversity of multiword constructions. Then, mutual information, a statistical associative measure derived from the joint probability distribution of a pair of collocated multiword constructions, was used to quantify the mutual dependency of the entrenched and newly emerged multiword constructions in CP3 and CP4 respectively. Findings of the phase one analysis were consistent with a three-stage developmental process postulated by the Construction Grammar. Item-Based Constructions were observed in CP1. They were simple clauses with an event word, which only combined with a limited range of content or function words in obligatory occasions to represent daily experiential scenes (e.g., reading scenes: this – object name). Next, Abstract Constructions were frequently observed in CP2. They were simple clauses combined by two or more groups to represent the participants and their semantic relations in daily events (e.g., [Verbal Process: verbal group] – [Patient: nominal group], and [Carrier: nominal group] – [Attribute: adjectival group]). Lastly, expanded Abstract Constructions were frequently used in CP3 and CP4. They were simple, hypotaxic, and parataxic clauses combined by a diverse range of expanded groups and clause complexes (e.g., [Carrier: expanded nominal group] – [Attribute: hypotaxic clause], and [dependent clause] – [main clause]). Some of the constituents in these constructions were high frequency groups and clauses acquired at a preceding stage. Findings of phase two analysis revealed that the entropies of information structures, clause structures, and group structures of the multiword constructions had an overall increasing trend across the study period, with individual clauses and group structures showing a lower rate of increment or decrement at a specific stage. In addition, the measures of mutual information demonstrated that the co-occurrence of multiword constructions acquired at a preceding stage and multiword constructions emerged at a later stage was significantly higher than chance. The higher the frequency of the multiword constructions, the higher the variability of their collocation patterns at a later stage. The application of a qualitative stratified approach in combination with quantitative analyses using information theory in research on language development is discussed. Implications of theory-based and data driven analyses on clinical assessment and intervention of language disorders are also discussed.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshCantonese dialects - Study and teaching (Early childhood)-
dc.titleA longitudinal corpus study on the emergence of multiword constructions in Cantonese using usage-based model and information theory-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineEducation-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044836037903414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats