File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Chinese named entity recognition using lexicalized HMMs

TitleChinese named entity recognition using lexicalized HMMs
Authors
KeywordsChinese named entity recognition
Character tagging
Known word tagging
Lexicalized hidden markov models
Issue Date2005
PublisherAssociation for Computing Machinery, Inc.
Citation
S I G K D D Explorations, 2005, v. 7 n. 1, p. 19-25 How to Cite?
AbstractThis paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied.
Persistent Identifierhttp://hdl.handle.net/10722/54298
ISSN

 

DC FieldValueLanguage
dc.contributor.authorFu, Gen_HK
dc.contributor.authorLuke, KKen_HK
dc.date.accessioned2009-04-03T07:42:32Z-
dc.date.available2009-04-03T07:42:32Z-
dc.date.issued2005en_HK
dc.identifier.citationS I G K D D Explorations, 2005, v. 7 n. 1, p. 19-25en_HK
dc.identifier.issn1931-0145en_HK
dc.identifier.urihttp://hdl.handle.net/10722/54298-
dc.description.abstractThis paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied.en_HK
dc.languageengen_HK
dc.publisherAssociation for Computing Machinery, Inc.en_HK
dc.rights© ACM, 2005. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.en_HK
dc.subjectChinese named entity recognitionen_HK
dc.subjectCharacter taggingen_HK
dc.subjectKnown word taggingen_HK
dc.subjectLexicalized hidden markov modelsen_HK
dc.titleChinese named entity recognition using lexicalized HMMsen_HK
dc.typeArticleen_HK
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1931-0145&volume=7&issue=1&spage=19&epage=25&date=2005&atitle=Chinese+named+entity+recognition+using+lexicalized+HMMsen_HK
dc.identifier.emailFu, G: ghfu@hkucc.hku.hken_HK
dc.identifier.emailLuke, KK: kkluke@hkusua.hku.hken_HK
dc.description.naturepostprinten_HK
dc.identifier.doi10.1145/1089815.1089819en_HK
dc.identifier.hkuros103510-
dc.identifier.citeulike2090725-
dc.identifier.issnl1931-0145-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats