File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/978-3-642-45068-6_8
- Scopus: eid_2-s2.0-84893247675
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Extracting categorical topics from tweets using topic model
Title | Extracting categorical topics from tweets using topic model |
---|---|
Authors | |
Keywords | Gibbs Sampling Topic Model |
Issue Date | 2013 |
Publisher | Springer. |
Citation | 9th Asia Information Retrieval Societies Conference (AIRS 2013), Singapore, 9-11 December 2013. In Banchs, RE, Silvestri, F, Liu, T, et al. (Eds.), Information Retrieval Technology: 9th Asia Information Retrieval Societies Conference, AIRS 2013, Singapore, December 9-11, 2013. Proceedings, p. 86-96. Berlin: Springer, 2013 How to Cite? |
Abstract | Over the past few years, microblogging websites, such as Twitter, are growing increasingly popular. Different with traditional medias, tweets are structured data and with a lot of noisy words. Topic modeling algorithms for traditional medias have been studied well, but our understanding of Twitter still remains limited and few algorithms are specially designed to mine Twitter data according to its own characteristics. Previous studies usually employ only one type of topic to analyze hot topics of the Twitter community and are greatly affected by the large amount of noisy words in tweets. We have observed that, in the Twitter community, users tend to discuss two types of topics actually. One mainly focuses on their personal lives and the other on hot issues of the society. These two types of topics usually yield different distributions. In this paper, we introduce the Categorical Topic Model. This model incorporates the features of Twitter data to divide topics into two types in semantic and introduce a word distribution for background words to filter out noisy words. Our model is able to discover different types of topics efficiently, indicate which topics are interested by an user and find hot issues of the Twitter community. Employing the Gibbs sampling, we compare our model with Latent Dirichlet Allocation and Author Topic Model on the TREC2011 data set and examples of discovered public topics and personal topics are also discussed in our paper. © 2013 Springer-Verlag. |
Persistent Identifier | http://hdl.handle.net/10722/311383 |
ISBN | |
ISSN | 2023 SCImago Journal Rankings: 0.606 |
Series/Report no. | Lecture Notes in Computer Science ; 8281 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zheng, Lei | - |
dc.contributor.author | Han, Kai | - |
dc.date.accessioned | 2022-03-22T11:53:48Z | - |
dc.date.available | 2022-03-22T11:53:48Z | - |
dc.date.issued | 2013 | - |
dc.identifier.citation | 9th Asia Information Retrieval Societies Conference (AIRS 2013), Singapore, 9-11 December 2013. In Banchs, RE, Silvestri, F, Liu, T, et al. (Eds.), Information Retrieval Technology: 9th Asia Information Retrieval Societies Conference, AIRS 2013, Singapore, December 9-11, 2013. Proceedings, p. 86-96. Berlin: Springer, 2013 | - |
dc.identifier.isbn | 9783642450679 | - |
dc.identifier.issn | 0302-9743 | - |
dc.identifier.uri | http://hdl.handle.net/10722/311383 | - |
dc.description.abstract | Over the past few years, microblogging websites, such as Twitter, are growing increasingly popular. Different with traditional medias, tweets are structured data and with a lot of noisy words. Topic modeling algorithms for traditional medias have been studied well, but our understanding of Twitter still remains limited and few algorithms are specially designed to mine Twitter data according to its own characteristics. Previous studies usually employ only one type of topic to analyze hot topics of the Twitter community and are greatly affected by the large amount of noisy words in tweets. We have observed that, in the Twitter community, users tend to discuss two types of topics actually. One mainly focuses on their personal lives and the other on hot issues of the society. These two types of topics usually yield different distributions. In this paper, we introduce the Categorical Topic Model. This model incorporates the features of Twitter data to divide topics into two types in semantic and introduce a word distribution for background words to filter out noisy words. Our model is able to discover different types of topics efficiently, indicate which topics are interested by an user and find hot issues of the Twitter community. Employing the Gibbs sampling, we compare our model with Latent Dirichlet Allocation and Author Topic Model on the TREC2011 data set and examples of discovered public topics and personal topics are also discussed in our paper. © 2013 Springer-Verlag. | - |
dc.language | eng | - |
dc.publisher | Springer. | - |
dc.relation.ispartof | Information Retrieval Technology: 9th Asia Information Retrieval Societies Conference, AIRS 2013, Singapore, December 9-11, 2013. Proceedings | - |
dc.relation.ispartofseries | Lecture Notes in Computer Science ; 8281 | - |
dc.subject | Gibbs Sampling | - |
dc.subject | Topic Model | - |
dc.subject | - | |
dc.title | Extracting categorical topics from tweets using topic model | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1007/978-3-642-45068-6_8 | - |
dc.identifier.scopus | eid_2-s2.0-84893247675 | - |
dc.identifier.spage | 86 | - |
dc.identifier.epage | 96 | - |
dc.identifier.eissn | 1611-3349 | - |
dc.publisher.place | Berlin | - |