File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/2334801.2334803
- Scopus: eid_2-s2.0-84866491555
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Adaptive bayesian HMM for fully unsupervised chinese part-of-speech induction
Title | Adaptive bayesian HMM for fully unsupervised chinese part-of-speech induction |
---|---|
Authors | |
Keywords | Bayesian HMM Chinese language model Dirichlet distribution Part-of-speech induction Variational inference |
Issue Date | 2012 |
Publisher | Association for Computing Machinery, Inc. The Journal's web site is located at http://talip.acm.org |
Citation | ACM Transactions on Asian Language Information Processing, 2012, v. 11 n. 3, article no. 9 How to Cite? |
Abstract | We propose an adaptive Bayesian hidden Markov model for fully unsupervised part-of-speech (POS) induction. The proposed model with its inference algorithm has two extensions to the first-order Bayesian HMM with Dirichlet priors. First our algorithm infers the optimal number of hidden states from the training corpus rather than fixes the dimensionality of state space beforehand. The second extension studies the Chinese unknown word processing module which measures similarities from both morphological properties and context distribution. Experimental results showed that both of these two extensions can help to find the optimal categories for Chinese in terms of both unsupervised clustering metrics and grammar induction accuracies on the Chinese Treebank. © 2012 ACM. |
Persistent Identifier | http://hdl.handle.net/10722/165866 |
ISSN |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhang, L | en_US |
dc.contributor.author | Chan, KP | en_US |
dc.date.accessioned | 2012-09-20T08:24:38Z | - |
dc.date.available | 2012-09-20T08:24:38Z | - |
dc.date.issued | 2012 | en_US |
dc.identifier.citation | ACM Transactions on Asian Language Information Processing, 2012, v. 11 n. 3, article no. 9 | en_US |
dc.identifier.issn | 1530-0226 | - |
dc.identifier.uri | http://hdl.handle.net/10722/165866 | - |
dc.description.abstract | We propose an adaptive Bayesian hidden Markov model for fully unsupervised part-of-speech (POS) induction. The proposed model with its inference algorithm has two extensions to the first-order Bayesian HMM with Dirichlet priors. First our algorithm infers the optimal number of hidden states from the training corpus rather than fixes the dimensionality of state space beforehand. The second extension studies the Chinese unknown word processing module which measures similarities from both morphological properties and context distribution. Experimental results showed that both of these two extensions can help to find the optimal categories for Chinese in terms of both unsupervised clustering metrics and grammar induction accuracies on the Chinese Treebank. © 2012 ACM. | - |
dc.language | eng | en_US |
dc.publisher | Association for Computing Machinery, Inc. The Journal's web site is located at http://talip.acm.org | - |
dc.relation.ispartof | ACM Transactions on Asian Language Information Processing | en_US |
dc.rights | ACM Transactions on Asian Language Information Processing. Copyright © Association for Computing Machinery, Inc. | - |
dc.subject | Bayesian HMM | - |
dc.subject | Chinese language model | - |
dc.subject | Dirichlet distribution | - |
dc.subject | Part-of-speech induction | - |
dc.subject | Variational inference | - |
dc.title | Adaptive bayesian HMM for fully unsupervised chinese part-of-speech induction | en_US |
dc.type | Article | en_US |
dc.identifier.email | Zhang, L: lzhang@cs.hku.hk | en_US |
dc.identifier.email | Chan, KP: kpchan@cs.hku.hk | - |
dc.identifier.authority | Chan, KP=rp00092 | en_US |
dc.description.nature | link_to_OA_fulltext | - |
dc.identifier.doi | 10.1145/2334801.2334803 | - |
dc.identifier.scopus | eid_2-s2.0-84866491555 | - |
dc.identifier.hkuros | 210965 | en_US |
dc.identifier.volume | 11 | en_US |
dc.identifier.issue | 3 | - |
dc.identifier.eissn | 1558-3430 | - |
dc.publisher.place | United States | - |
dc.identifier.issnl | 1530-0226 | - |