Extracting categorical topics from tweets using topic model

Zheng, Lei; Han, Kai

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/978-3-642-45068-6_8
Scopus: eid_2-s2.0-84893247675
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Statistics & Actuarial Science: Conference papers

Conference Paper: Extracting categorical topics from tweets using topic model

Title	Extracting categorical topics from tweets using topic model
Authors	Zheng, Lei Han, Kai
Keywords	Gibbs Sampling Topic Model Twitter
Issue Date	2013
Publisher	Springer.
Citation	9th Asia Information Retrieval Societies Conference (AIRS 2013), Singapore, 9-11 December 2013. In Banchs, RE, Silvestri, F, Liu, T, et al. (Eds.), Information Retrieval Technology: 9th Asia Information Retrieval Societies Conference, AIRS 2013, Singapore, December 9-11, 2013. Proceedings, p. 86-96. Berlin: Springer, 2013 How to Cite? DOI: http://dx.doi.org/10.1007/978-3-642-45068-6_8
Abstract	Over the past few years, microblogging websites, such as Twitter, are growing increasingly popular. Different with traditional medias, tweets are structured data and with a lot of noisy words. Topic modeling algorithms for traditional medias have been studied well, but our understanding of Twitter still remains limited and few algorithms are specially designed to mine Twitter data according to its own characteristics. Previous studies usually employ only one type of topic to analyze hot topics of the Twitter community and are greatly affected by the large amount of noisy words in tweets. We have observed that, in the Twitter community, users tend to discuss two types of topics actually. One mainly focuses on their personal lives and the other on hot issues of the society. These two types of topics usually yield different distributions. In this paper, we introduce the Categorical Topic Model. This model incorporates the features of Twitter data to divide topics into two types in semantic and introduce a word distribution for background words to filter out noisy words. Our model is able to discover different types of topics efficiently, indicate which topics are interested by an user and find hot issues of the Twitter community. Employing the Gibbs sampling, we compare our model with Latent Dirichlet Allocation and Author Topic Model on the TREC2011 data set and examples of discovered public topics and personal topics are also discussed in our paper. © 2013 Springer-Verlag.
Persistent Identifier	http://hdl.handle.net/10722/311383
ISBN	9783642450679
ISSN	0302-9743 2023 SCImago Journal Rankings: 0.606
Series/Report no.	Lecture Notes in Computer Science ; 8281

DC Field	Value	Language
dc.contributor.author	Zheng, Lei	-
dc.contributor.author	Han, Kai	-
dc.date.accessioned	2022-03-22T11:53:48Z	-
dc.date.available	2022-03-22T11:53:48Z	-
dc.date.issued	2013	-
dc.identifier.citation	9th Asia Information Retrieval Societies Conference (AIRS 2013), Singapore, 9-11 December 2013. In Banchs, RE, Silvestri, F, Liu, T, et al. (Eds.), Information Retrieval Technology: 9th Asia Information Retrieval Societies Conference, AIRS 2013, Singapore, December 9-11, 2013. Proceedings, p. 86-96. Berlin: Springer, 2013	-
dc.identifier.isbn	9783642450679	-
dc.identifier.issn	0302-9743	-
dc.identifier.uri	http://hdl.handle.net/10722/311383	-
dc.description.abstract	Over the past few years, microblogging websites, such as Twitter, are growing increasingly popular. Different with traditional medias, tweets are structured data and with a lot of noisy words. Topic modeling algorithms for traditional medias have been studied well, but our understanding of Twitter still remains limited and few algorithms are specially designed to mine Twitter data according to its own characteristics. Previous studies usually employ only one type of topic to analyze hot topics of the Twitter community and are greatly affected by the large amount of noisy words in tweets. We have observed that, in the Twitter community, users tend to discuss two types of topics actually. One mainly focuses on their personal lives and the other on hot issues of the society. These two types of topics usually yield different distributions. In this paper, we introduce the Categorical Topic Model. This model incorporates the features of Twitter data to divide topics into two types in semantic and introduce a word distribution for background words to filter out noisy words. Our model is able to discover different types of topics efficiently, indicate which topics are interested by an user and find hot issues of the Twitter community. Employing the Gibbs sampling, we compare our model with Latent Dirichlet Allocation and Author Topic Model on the TREC2011 data set and examples of discovered public topics and personal topics are also discussed in our paper. © 2013 Springer-Verlag.	-
dc.language	eng	-
dc.publisher	Springer.	-
dc.relation.ispartof	Information Retrieval Technology: 9th Asia Information Retrieval Societies Conference, AIRS 2013, Singapore, December 9-11, 2013. Proceedings	-
dc.relation.ispartofseries	Lecture Notes in Computer Science ; 8281	-
dc.subject	Gibbs Sampling	-
dc.subject	Topic Model	-
dc.subject	Twitter	-
dc.title	Extracting categorical topics from tweets using topic model	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/978-3-642-45068-6_8	-
dc.identifier.scopus	eid_2-s2.0-84893247675	-
dc.identifier.spage	86	-
dc.identifier.epage	96	-
dc.identifier.eissn	1611-3349	-
dc.publisher.place	Berlin	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Extracting categorical topics from tweets using topic model

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats