Latent topic mining in social media with probabilistic graphical model

Liu, Yu; 刘宇

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_991043979535803414

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Latent topic mining in social media with probabilistic graphical model

Title	Latent topic mining in social media with probabilistic graphical model
Authors	Liu, Yu 刘宇
Advisors	Advisor(s):Cheung, DWL Mamoulis, N
Issue Date	2017
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Liu, Y. [刘宇]. (2017). Latent topic mining in social media with probabilistic graphical model. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	The popularity of Internet has caused an increasing amount of data. Data are not only rich in amount, but also rich in the sense of types of sources and content, e.g. textual data, location-based social networks(LBSN) check-in data, rating data. These data usually contain implicit knowledge. This knowledge is expressed as a hierarchy of latent topics, where each topic contains the subset of the specific data that are related to it and a summary extracted from these specific data. Many applications in practice, which involve many aspects of our everyday lives, exhibit ubiquitous interest on mining latent topic in various data objects. In this thesis, three challenging and interesting latent topic mining problems are brought into consideration and extensively investigated, targeting three novel data types and scenarios: (i) mining check-in topics in LBSN, (ii) mining latent textual topics for spatial event detection, and (iii) mining latent topics among rating records for rating prediction system. First, check-in topics are studied in LBSN data. Twitter, together with other online social networks, have begun to collect hundreds of millions of check-ins. Check-in data captures the spatial and temporal information of user movements and interests. To model and analyze the spatio-temporal aspect of check-in data and discover temporal topics and regions, we proposed several spatio-temporal topic models. In our quantitative analysis, we evaluate the effectiveness of models in terms of perplexity, accuracy of POI recommendation, and accuracy of user and time prediction, which demonstrates a substantial improvement. Next, mining latent textual topics is investigated for spatial event detection. Unlike traditional event detection that focuses on the timing of event, the task of spatial event detection is to detect the spatial regions where events occur. In this thesis, we focus on the problem of spatial event detection using textual information in social media. We observe that, when a spatial event occurs, the topics relevant to the event are often discussed more coherently in cities near the event location than those far away. That means that the latent topics mined in event regions are different from that in normal regions. In order to capture this pattern, we propose Graph-TSS to efficiently discover the spatial regions. As a case study, we consider three applications, including Argentina civil unrest, Chile earthquake, and US influenza disease outbreak. Finally, mining latent topics among rating records is studied for rating prediction system. Specifically, consider a set of users and a set of items: each user can rate any item by giving it a score either explicitly or implicitly. Given a target user, for each item that user has not rated, the system can predict the rating, based on the existing ratings of other users and the rating topics learnt. While most of the previous work focus on one-level structure, many real recommender systems contain multi-level structures. To utilize hierarchical information existing in many recommender systems to improve the recommendation quality, we propose a new hierarchical matrix factorization method to make use of the implicit hierarchical structures for rating prediction.
Degree	Doctor of Philosophy
Subject	Data mining Social media
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/250737

DC Field	Value	Language
dc.contributor.advisor	Cheung, DWL	-
dc.contributor.advisor	Mamoulis, N	-
dc.contributor.author	Liu, Yu	-
dc.contributor.author	刘宇	-
dc.date.accessioned	2018-01-26T01:59:25Z	-
dc.date.available	2018-01-26T01:59:25Z	-
dc.date.issued	2017	-
dc.identifier.citation	Liu, Y. [刘宇]. (2017). Latent topic mining in social media with probabilistic graphical model. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/250737	-
dc.description.abstract	The popularity of Internet has caused an increasing amount of data. Data are not only rich in amount, but also rich in the sense of types of sources and content, e.g. textual data, location-based social networks(LBSN) check-in data, rating data. These data usually contain implicit knowledge. This knowledge is expressed as a hierarchy of latent topics, where each topic contains the subset of the specific data that are related to it and a summary extracted from these specific data. Many applications in practice, which involve many aspects of our everyday lives, exhibit ubiquitous interest on mining latent topic in various data objects. In this thesis, three challenging and interesting latent topic mining problems are brought into consideration and extensively investigated, targeting three novel data types and scenarios: (i) mining check-in topics in LBSN, (ii) mining latent textual topics for spatial event detection, and (iii) mining latent topics among rating records for rating prediction system. First, check-in topics are studied in LBSN data. Twitter, together with other online social networks, have begun to collect hundreds of millions of check-ins. Check-in data captures the spatial and temporal information of user movements and interests. To model and analyze the spatio-temporal aspect of check-in data and discover temporal topics and regions, we proposed several spatio-temporal topic models. In our quantitative analysis, we evaluate the effectiveness of models in terms of perplexity, accuracy of POI recommendation, and accuracy of user and time prediction, which demonstrates a substantial improvement. Next, mining latent textual topics is investigated for spatial event detection. Unlike traditional event detection that focuses on the timing of event, the task of spatial event detection is to detect the spatial regions where events occur. In this thesis, we focus on the problem of spatial event detection using textual information in social media. We observe that, when a spatial event occurs, the topics relevant to the event are often discussed more coherently in cities near the event location than those far away. That means that the latent topics mined in event regions are different from that in normal regions. In order to capture this pattern, we propose Graph-TSS to efficiently discover the spatial regions. As a case study, we consider three applications, including Argentina civil unrest, Chile earthquake, and US influenza disease outbreak. Finally, mining latent topics among rating records is studied for rating prediction system. Specifically, consider a set of users and a set of items: each user can rate any item by giving it a score either explicitly or implicitly. Given a target user, for each item that user has not rated, the system can predict the rating, based on the existing ratings of other users and the rating topics learnt. While most of the previous work focus on one-level structure, many real recommender systems contain multi-level structures. To utilize hierarchical information existing in many recommender systems to improve the recommendation quality, we propose a new hierarchical matrix factorization method to make use of the implicit hierarchical structures for rating prediction.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Data mining	-
dc.subject.lcsh	Social media	-
dc.title	Latent topic mining in social media with probabilistic graphical model	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_991043979535803414	-
dc.date.hkucongregation	2017	-
dc.identifier.mmsid	991043979535803414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Latent topic mining in social media with probabilistic graphical model

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats