File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Advanced rank-aware queries and recommendation with novel types of data
Title | Advanced rank-aware queries and recommendation with novel types of data |
---|---|
Authors | |
Advisors | |
Issue Date | 2014 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Wang, H. [王皓]. (2014). Advanced rank-aware queries and recommendation with novel types of data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5270554 |
Abstract | Nowadays we are living in an era of rich data, not only in the sense of the amount of data, but also in the sense of various sources and content of data. Efficient search, management, and exploitation of data have, over decades, been a major direction of database research. In this thesis, three challenging problems are proposed and studied, targeting (i) time series data, (ii) user preference data, and (iii) location-based social network data, respectively, providing efficient solutions to corresponding real-life applications.
First, durability queries are studied in historical time series databases, which identify objects that have durable quality over time. For example, a sociologist may be interested in the top 10 web search terms during the period of some historical events; the police may seek for vehicles that move close to a suspect 70% of the time during a certain time, etc. Such durable top-k (DTop-k) and durable k-nearest neighbor (DkNN) queries can be viewed as natural extensions of the standard snapshot top-k and NN queries to timestamped sequences of values or locations. Although their snapshot counterparts have been studied extensively, there is little prior work that addresses this new class of durability queries. Efficient and scalable algorithms are proposed based on novel indexing techniques.
Next, an efficient solution to k-nearest neighbor search over top-m lists is investigated. A top-m list is a ranking of m items, typically representing some user’s preference over these items. For example, a user may have a list of her 10 most favourite books; the result from a search engine is typically a list of webpages ranked according to their relevance to some keywords. The search problem aims at extracting k top-m lists from the database that are the “closest” to some query list where the closeness is evaluated using commonly used measures such as the Fagin’s intersection metric, Spearman’s footrule, Kendall’s tau, etc. Despite of the importance of such queries, there’s little prior work suggesting any efficient solution. In this thesis, a unified framework is proposed to answer such queries efficiently.
Finally, the problem of top-N venue recommendation in location-based social networks (LBSNs) is studied, which recommends new venues to users. As an increasingly larger number of users partake in LBSNs, the recommendation problem in this setting has attracted significant attention in research and in practical applications. The detailed information about past user behavior that is traced by the LBSN differentiates the problem significantly from its traditional settings. The spatial nature in the past user behavior and also the information about the user social interaction with other users, provide a richer background to build a more accurate and expressive recommendation model. Although there have been extensive studies on recommender systems working with user-item ratings, GPS trajectories, and other types of data, there are very few approaches that exploit the unique properties of the LBSN user check-in data. In this thesis, effective and efficient algorithms that create recommendations are proposed based on such properties. |
Degree | Doctor of Philosophy |
Subject | Data mining Time-series analysis - Computer programs Social networks - Data processing |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/206672 |
HKU Library Item ID | b5270554 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Mamoulis, N | - |
dc.contributor.advisor | Cheung, DWL | - |
dc.contributor.author | Wang, Hao | - |
dc.contributor.author | 王皓 | - |
dc.date.accessioned | 2014-11-25T03:53:15Z | - |
dc.date.available | 2014-11-25T03:53:15Z | - |
dc.date.issued | 2014 | - |
dc.identifier.citation | Wang, H. [王皓]. (2014). Advanced rank-aware queries and recommendation with novel types of data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5270554 | - |
dc.identifier.uri | http://hdl.handle.net/10722/206672 | - |
dc.description.abstract | Nowadays we are living in an era of rich data, not only in the sense of the amount of data, but also in the sense of various sources and content of data. Efficient search, management, and exploitation of data have, over decades, been a major direction of database research. In this thesis, three challenging problems are proposed and studied, targeting (i) time series data, (ii) user preference data, and (iii) location-based social network data, respectively, providing efficient solutions to corresponding real-life applications. First, durability queries are studied in historical time series databases, which identify objects that have durable quality over time. For example, a sociologist may be interested in the top 10 web search terms during the period of some historical events; the police may seek for vehicles that move close to a suspect 70% of the time during a certain time, etc. Such durable top-k (DTop-k) and durable k-nearest neighbor (DkNN) queries can be viewed as natural extensions of the standard snapshot top-k and NN queries to timestamped sequences of values or locations. Although their snapshot counterparts have been studied extensively, there is little prior work that addresses this new class of durability queries. Efficient and scalable algorithms are proposed based on novel indexing techniques. Next, an efficient solution to k-nearest neighbor search over top-m lists is investigated. A top-m list is a ranking of m items, typically representing some user’s preference over these items. For example, a user may have a list of her 10 most favourite books; the result from a search engine is typically a list of webpages ranked according to their relevance to some keywords. The search problem aims at extracting k top-m lists from the database that are the “closest” to some query list where the closeness is evaluated using commonly used measures such as the Fagin’s intersection metric, Spearman’s footrule, Kendall’s tau, etc. Despite of the importance of such queries, there’s little prior work suggesting any efficient solution. In this thesis, a unified framework is proposed to answer such queries efficiently. Finally, the problem of top-N venue recommendation in location-based social networks (LBSNs) is studied, which recommends new venues to users. As an increasingly larger number of users partake in LBSNs, the recommendation problem in this setting has attracted significant attention in research and in practical applications. The detailed information about past user behavior that is traced by the LBSN differentiates the problem significantly from its traditional settings. The spatial nature in the past user behavior and also the information about the user social interaction with other users, provide a richer background to build a more accurate and expressive recommendation model. Although there have been extensive studies on recommender systems working with user-item ratings, GPS trajectories, and other types of data, there are very few approaches that exploit the unique properties of the LBSN user check-in data. In this thesis, effective and efficient algorithms that create recommendations are proposed based on such properties. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.subject.lcsh | Data mining | - |
dc.subject.lcsh | Time-series analysis - Computer programs | - |
dc.subject.lcsh | Social networks - Data processing | - |
dc.title | Advanced rank-aware queries and recommendation with novel types of data | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5270554 | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b5270554 | - |
dc.identifier.mmsid | 991038814969703414 | - |