File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Latent topic mining in social media with probabilistic graphical model
Title | Latent topic mining in social media with probabilistic graphical model |
---|---|
Authors | |
Advisors | |
Issue Date | 2017 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Liu, Y. [刘宇]. (2017). Latent topic mining in social media with probabilistic graphical model. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | The popularity of Internet has caused an increasing amount of data. Data are not only rich in amount, but also rich in the sense of types of sources and content, e.g. textual data, location-based social networks(LBSN) check-in data, rating data. These data usually contain implicit knowledge. This knowledge is expressed as a hierarchy of latent topics, where each topic contains the subset of the specific data that are related to it and a summary extracted from these specific data. Many applications in practice, which involve many aspects of our everyday lives, exhibit ubiquitous interest on mining latent topic in various data objects. In this thesis, three challenging and interesting latent topic mining problems are brought into consideration and extensively investigated, targeting three novel data types and scenarios: (i) mining check-in topics in LBSN, (ii) mining latent textual topics for spatial event detection, and (iii) mining latent topics among rating records for rating prediction system.
First, check-in topics are studied in LBSN data. Twitter, together with other online social networks, have begun to collect hundreds of millions of check-ins. Check-in data captures the spatial and temporal information of user movements and interests. To model and analyze the spatio-temporal aspect of check-in data and discover temporal topics and regions, we proposed several spatio-temporal topic models. In our quantitative analysis, we evaluate the effectiveness of models in terms of perplexity, accuracy of POI recommendation, and accuracy of user and time prediction, which demonstrates a substantial improvement.
Next, mining latent textual topics is investigated for spatial event detection. Unlike traditional event detection that focuses on the timing of event, the task of spatial event detection is to detect the spatial regions where events occur. In this thesis, we focus on the problem of spatial event detection using textual information in social media. We observe that, when a spatial event occurs, the topics relevant to the event are often discussed more coherently in cities near the event location than those far away. That means that the latent topics mined in event regions are different from that in normal regions. In order to capture this pattern, we propose Graph-TSS to efficiently discover the spatial regions. As a case study, we consider three applications, including Argentina civil unrest, Chile earthquake, and US influenza disease outbreak.
Finally, mining latent topics among rating records is studied for rating prediction system. Specifically, consider a set of users and a set of items: each user can rate any item by giving it a score either explicitly or implicitly. Given a target user, for each item that user has not rated, the system can predict the rating, based on the existing ratings of other users and the rating topics learnt. While most of the previous work focus on one-level structure, many real recommender systems contain multi-level structures. To utilize hierarchical information existing in many recommender systems to improve the recommendation quality, we propose a new hierarchical matrix factorization method to make use of the implicit hierarchical structures for rating prediction. |
Degree | Doctor of Philosophy |
Subject | Data mining Social media |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/250737 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Cheung, DWL | - |
dc.contributor.advisor | Mamoulis, N | - |
dc.contributor.author | Liu, Yu | - |
dc.contributor.author | 刘宇 | - |
dc.date.accessioned | 2018-01-26T01:59:25Z | - |
dc.date.available | 2018-01-26T01:59:25Z | - |
dc.date.issued | 2017 | - |
dc.identifier.citation | Liu, Y. [刘宇]. (2017). Latent topic mining in social media with probabilistic graphical model. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/250737 | - |
dc.description.abstract | The popularity of Internet has caused an increasing amount of data. Data are not only rich in amount, but also rich in the sense of types of sources and content, e.g. textual data, location-based social networks(LBSN) check-in data, rating data. These data usually contain implicit knowledge. This knowledge is expressed as a hierarchy of latent topics, where each topic contains the subset of the specific data that are related to it and a summary extracted from these specific data. Many applications in practice, which involve many aspects of our everyday lives, exhibit ubiquitous interest on mining latent topic in various data objects. In this thesis, three challenging and interesting latent topic mining problems are brought into consideration and extensively investigated, targeting three novel data types and scenarios: (i) mining check-in topics in LBSN, (ii) mining latent textual topics for spatial event detection, and (iii) mining latent topics among rating records for rating prediction system. First, check-in topics are studied in LBSN data. Twitter, together with other online social networks, have begun to collect hundreds of millions of check-ins. Check-in data captures the spatial and temporal information of user movements and interests. To model and analyze the spatio-temporal aspect of check-in data and discover temporal topics and regions, we proposed several spatio-temporal topic models. In our quantitative analysis, we evaluate the effectiveness of models in terms of perplexity, accuracy of POI recommendation, and accuracy of user and time prediction, which demonstrates a substantial improvement. Next, mining latent textual topics is investigated for spatial event detection. Unlike traditional event detection that focuses on the timing of event, the task of spatial event detection is to detect the spatial regions where events occur. In this thesis, we focus on the problem of spatial event detection using textual information in social media. We observe that, when a spatial event occurs, the topics relevant to the event are often discussed more coherently in cities near the event location than those far away. That means that the latent topics mined in event regions are different from that in normal regions. In order to capture this pattern, we propose Graph-TSS to efficiently discover the spatial regions. As a case study, we consider three applications, including Argentina civil unrest, Chile earthquake, and US influenza disease outbreak. Finally, mining latent topics among rating records is studied for rating prediction system. Specifically, consider a set of users and a set of items: each user can rate any item by giving it a score either explicitly or implicitly. Given a target user, for each item that user has not rated, the system can predict the rating, based on the existing ratings of other users and the rating topics learnt. While most of the previous work focus on one-level structure, many real recommender systems contain multi-level structures. To utilize hierarchical information existing in many recommender systems to improve the recommendation quality, we propose a new hierarchical matrix factorization method to make use of the implicit hierarchical structures for rating prediction. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Data mining | - |
dc.subject.lcsh | Social media | - |
dc.title | Latent topic mining in social media with probabilistic graphical model | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991043979535803414 | - |
dc.date.hkucongregation | 2017 | - |
dc.identifier.mmsid | 991043979535803414 | - |