File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Advanced analysis and join queries in multidimensional spaces

TitleAdvanced analysis and join queries in multidimensional spaces
Authors
Advisors
Issue Date2012
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Ge, S. [葛屾]. (2012). Advanced analysis and join queries in multidimensional spaces. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4979933
AbstractMultidimensional data are ubiquitous and their efficient management and analysis is a core database research problem. There are lots of previous works focusing on indexing, analyzing and querying multidimensional data. In this dissertation, three challenging advanced analysis and join problems in multidimensional spaces are proposed and studied, providing efficient solutions to their related applications. First, the problem of generalized budget constrained optimization query (Gen-BOQ) is studied. In real life, it is often difficult for manufacturers to create new products dominating their competitors, due to some constraints. These constraints can be modeled by constraint functions, and the problem is then to decide the best possible regions in multidimensional spaces where the features of new products could be placed. Using the number of dominating and dominated objects, the profitability of these regions can be evaluated and the best areas are then returned. Although GenBOQ computation is challenging due to its high complexity, an efficient divide-and-conquer based framework is offered for this problem. In addition, an approximation method is proposed, making tradeoffs between the result quality and the query cost. Next, the efficient evaluation of all top-k queries (ATOPk) in multidimensional spaces is investigated, which compute the top ranked objects for a group of preference functions simultaneously. As an application of such a query, consider an online store, which needs to provide recommendations for a large number of users simultaneously. This problem is somewhat overlooked by past research; in this thesis, batch algorithms are proposed instead of naïvely evaluating top-k queries individually. Similar preferences are grouped together, and two algorithms are proposed, using block indexed nested loops and a view-based thresholding strategy. The optimized view-based threshold algorithm is demonstrated to be consistently the best. Moreover, an all top-k query helps to evaluate other queries relying on the results of multiple top-k queries, such as reverse top-k queries and top-m influential queries proposed in previous works. It is shown that applying the view-based approach to these queries can improve the performance of the current state-of-the-art by orders of magnitude. Finally, the problem of spatio-textual similarity joins (ST-SJOIN) on multidimensional data is considered. Given both spatial and textual information, ST-SJOIN retrieves pairs of objects which are both spatially close and textually similar. One possible application of this query is friendship recommendation, by matching people who not only live nearby but also share common interests. By combining the state-of-the-art strategies of spatial distance joins and set similarity joins, efficient query processing algorithms are proposed, taking both spatial and textual constraints into account. A batch processing strategy is also introduced to boost the performance, which is also effective for the original textual-only joins. Using synthetic and real datasets, it is shown that the proposed techniques outperform the baseline solutions.
DegreeDoctor of Philosophy
SubjectMultidimensional databases.
Database management.
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/181500
HKU Library Item IDb4979933

 

DC FieldValueLanguage
dc.contributor.advisorMamoulis, N-
dc.contributor.advisorCheung, DWL-
dc.contributor.authorGe, Shen.-
dc.contributor.author葛屾.-
dc.date.accessioned2013-03-03T03:20:15Z-
dc.date.available2013-03-03T03:20:15Z-
dc.date.issued2012-
dc.identifier.citationGe, S. [葛屾]. (2012). Advanced analysis and join queries in multidimensional spaces. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4979933-
dc.identifier.urihttp://hdl.handle.net/10722/181500-
dc.description.abstractMultidimensional data are ubiquitous and their efficient management and analysis is a core database research problem. There are lots of previous works focusing on indexing, analyzing and querying multidimensional data. In this dissertation, three challenging advanced analysis and join problems in multidimensional spaces are proposed and studied, providing efficient solutions to their related applications. First, the problem of generalized budget constrained optimization query (Gen-BOQ) is studied. In real life, it is often difficult for manufacturers to create new products dominating their competitors, due to some constraints. These constraints can be modeled by constraint functions, and the problem is then to decide the best possible regions in multidimensional spaces where the features of new products could be placed. Using the number of dominating and dominated objects, the profitability of these regions can be evaluated and the best areas are then returned. Although GenBOQ computation is challenging due to its high complexity, an efficient divide-and-conquer based framework is offered for this problem. In addition, an approximation method is proposed, making tradeoffs between the result quality and the query cost. Next, the efficient evaluation of all top-k queries (ATOPk) in multidimensional spaces is investigated, which compute the top ranked objects for a group of preference functions simultaneously. As an application of such a query, consider an online store, which needs to provide recommendations for a large number of users simultaneously. This problem is somewhat overlooked by past research; in this thesis, batch algorithms are proposed instead of naïvely evaluating top-k queries individually. Similar preferences are grouped together, and two algorithms are proposed, using block indexed nested loops and a view-based thresholding strategy. The optimized view-based threshold algorithm is demonstrated to be consistently the best. Moreover, an all top-k query helps to evaluate other queries relying on the results of multiple top-k queries, such as reverse top-k queries and top-m influential queries proposed in previous works. It is shown that applying the view-based approach to these queries can improve the performance of the current state-of-the-art by orders of magnitude. Finally, the problem of spatio-textual similarity joins (ST-SJOIN) on multidimensional data is considered. Given both spatial and textual information, ST-SJOIN retrieves pairs of objects which are both spatially close and textually similar. One possible application of this query is friendship recommendation, by matching people who not only live nearby but also share common interests. By combining the state-of-the-art strategies of spatial distance joins and set similarity joins, efficient query processing algorithms are proposed, taking both spatial and textual constraints into account. A batch processing strategy is also introduced to boost the performance, which is also effective for the original textual-only joins. Using synthetic and real datasets, it is shown that the proposed techniques outperform the baseline solutions.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.source.urihttp://hub.hku.hk/bib/B49799332-
dc.subject.lcshMultidimensional databases.-
dc.subject.lcshDatabase management.-
dc.titleAdvanced analysis and join queries in multidimensional spaces-
dc.typePG_Thesis-
dc.identifier.hkulb4979933-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b4979933-
dc.date.hkucongregation2013-
dc.identifier.mmsid991034240649703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats