File Download
Supplementary
-
Bookmarks:
- CiteULike: 1
-
Citations:
- Appears in Collections:
Article: Chinese named entity recognition using lexicalized HMMs
Title | Chinese named entity recognition using lexicalized HMMs |
---|---|
Authors | |
Keywords | Chinese named entity recognition Character tagging Known word tagging Lexicalized hidden markov models |
Issue Date | 2005 |
Publisher | Association for Computing Machinery, Inc. |
Citation | S I G K D D Explorations, 2005, v. 7 n. 1, p. 19-25 How to Cite? |
Abstract | This paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied. |
Persistent Identifier | http://hdl.handle.net/10722/54298 |
ISSN |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Fu, G | en_HK |
dc.contributor.author | Luke, KK | en_HK |
dc.date.accessioned | 2009-04-03T07:42:32Z | - |
dc.date.available | 2009-04-03T07:42:32Z | - |
dc.date.issued | 2005 | en_HK |
dc.identifier.citation | S I G K D D Explorations, 2005, v. 7 n. 1, p. 19-25 | en_HK |
dc.identifier.issn | 1931-0145 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/54298 | - |
dc.description.abstract | This paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied. | en_HK |
dc.language | eng | en_HK |
dc.publisher | Association for Computing Machinery, Inc. | en_HK |
dc.rights | © ACM, 2005. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. | en_HK |
dc.subject | Chinese named entity recognition | en_HK |
dc.subject | Character tagging | en_HK |
dc.subject | Known word tagging | en_HK |
dc.subject | Lexicalized hidden markov models | en_HK |
dc.title | Chinese named entity recognition using lexicalized HMMs | en_HK |
dc.type | Article | en_HK |
dc.identifier.openurl | http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1931-0145&volume=7&issue=1&spage=19&epage=25&date=2005&atitle=Chinese+named+entity+recognition+using+lexicalized+HMMs | en_HK |
dc.identifier.email | Fu, G: ghfu@hkucc.hku.hk | en_HK |
dc.identifier.email | Luke, KK: kkluke@hkusua.hku.hk | en_HK |
dc.description.nature | postprint | en_HK |
dc.identifier.doi | 10.1145/1089815.1089819 | en_HK |
dc.identifier.hkuros | 103510 | - |
dc.identifier.citeulike | 2090725 | - |
dc.identifier.issnl | 1931-0145 | - |