Ranking and similarity queries on complex data types

Cai, Yilun; 蔡奕倫

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_b5435640

Supplementary

Citations:
Appears in Collections:
- Computer Science: Theses
- HKU Theses Online

postgraduate thesis: Ranking and similarity queries on complex data types

Title	Ranking and similarity queries on complex data types
Authors	Cai, Yilun 蔡奕倫
Issue Date	2014
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Cai, Y. [蔡奕倫]. (2014). Ranking and similarity queries on complex data types. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5435640
Abstract	Ranking queries and similarity queries are elementary operations with many important applications. There are lots of research works investigating efficient evaluation of various ranking and similarity queries in databases over the past few decades. In this thesis, ranking and similarity queries on three interesting complex data types are studied, namely, multidimensional cube, object summary and tree. Efficient and effective solutions are proposed to solve their related applications. First, the evaluation of ranking queries on multidimensional cubes is studied. In exploratory data analysis, a relation can be considered as a multidimensional cube to investigate the relationship among its attributes. Given a relation with records that can be ranked, an interesting problem is to identify selection conditions for the relation, which result in sub-relations qualified by an input record and render the ranking of the input record as high as possible among the qualifying tuples. The ranking of the input record in a sub-relation measures the quality of the corresponding multidimensional cube of this sub-relation. In this thesis, a standing maximization problem, which aims to identify a multidimensional cube of high quality, is extensively studied. As an immediate consequence of its NP-hardness, three greedy methods are proposed to explore the search space only partially, while striving to identify sub-optimal solutions of high quality. Next, the efficient evaluation of ranking queries on object summaries is investigated. An object summary is a tree structure of tuples that summarizes the context of a particular data subject tuple. The object summary has been used as a model of keyword search in relational databases; where given a set of keywords, the objective is to identify the data subject tuples relevant to the keywords and their corresponding object summaries. However, a keyword search result may return a large number of object summaries, which brings in the issue of effectively and efficiently ranking them in order to present only the most important ones to the user. In this thesis, a model that ranks object summaries according to their relevance to a set of input thematic keywords is introduced. Efficient algorithms are proposed to answer the proposed thematic ranking query. Finally, the similarity join query on tree-structured data is studied. Treestructured data are ubiquitous nowadays and a number of applications require efficient management of such data. Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this thesis, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph.
Degree	Doctor of Philosophy
Subject	Database management
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/209507
HKU Library Item ID	b5435640

DC Field	Value	Language
dc.contributor.author	Cai, Yilun	-
dc.contributor.author	蔡奕倫	-
dc.date.accessioned	2015-04-23T23:10:55Z	-
dc.date.available	2015-04-23T23:10:55Z	-
dc.date.issued	2014	-
dc.identifier.citation	Cai, Y. [蔡奕倫]. (2014). Ranking and similarity queries on complex data types. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5435640	-
dc.identifier.uri	http://hdl.handle.net/10722/209507	-
dc.description.abstract	Ranking queries and similarity queries are elementary operations with many important applications. There are lots of research works investigating efficient evaluation of various ranking and similarity queries in databases over the past few decades. In this thesis, ranking and similarity queries on three interesting complex data types are studied, namely, multidimensional cube, object summary and tree. Efficient and effective solutions are proposed to solve their related applications. First, the evaluation of ranking queries on multidimensional cubes is studied. In exploratory data analysis, a relation can be considered as a multidimensional cube to investigate the relationship among its attributes. Given a relation with records that can be ranked, an interesting problem is to identify selection conditions for the relation, which result in sub-relations qualified by an input record and render the ranking of the input record as high as possible among the qualifying tuples. The ranking of the input record in a sub-relation measures the quality of the corresponding multidimensional cube of this sub-relation. In this thesis, a standing maximization problem, which aims to identify a multidimensional cube of high quality, is extensively studied. As an immediate consequence of its NP-hardness, three greedy methods are proposed to explore the search space only partially, while striving to identify sub-optimal solutions of high quality. Next, the efficient evaluation of ranking queries on object summaries is investigated. An object summary is a tree structure of tuples that summarizes the context of a particular data subject tuple. The object summary has been used as a model of keyword search in relational databases; where given a set of keywords, the objective is to identify the data subject tuples relevant to the keywords and their corresponding object summaries. However, a keyword search result may return a large number of object summaries, which brings in the issue of effectively and efficiently ranking them in order to present only the most important ones to the user. In this thesis, a model that ranks object summaries according to their relevance to a set of input thematic keywords is introduced. Efficient algorithms are proposed to answer the proposed thematic ranking query. Finally, the similarity join query on tree-structured data is studied. Treestructured data are ubiquitous nowadays and a number of applications require efficient management of such data. Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this thesis, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.subject.lcsh	Database management	-
dc.title	Ranking and similarity queries on complex data types	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b5435640	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_b5435640	-
dc.identifier.mmsid	991003165859703414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Ranking and similarity queries on complex data types

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats