File Download
Supplementary

postgraduate thesis: Evaluating probabilistic and higher-order queries on big graphs

TitleEvaluating probabilistic and higher-order queries on big graphs
Authors
Advisors
Advisor(s):Cheng, CKR
Issue Date2020
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Li, X. [李晓東]. (2020). Evaluating probabilistic and higher-order queries on big graphs. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractLarge graphs are prevalent in many areas, such as social networks, traffic networks, and biological networks. Although the topic has attracted a lot of interest from both research and industry communities, the efficiency and effectiveness issues of probabilistic and higher-order graph queries have not been well studied, especially for graphs of large sizes. In this thesis, we address four challenging problems regarding the probabilistic and higher-order queries on large graphs, i.e., (i) k-nearest neighbor (k-NN) query on uncertain graphs; (ii) k-aggregate nearest neighbor (k-ANN) query on uncertain graphs; (iii) motif path query on large graphs; (iv) graph query language (GQL) for motif-related queries on knowledge graphs. First, graphs are often inexact. For example, in a protein-protein interaction network, an edge between two nodes u and v indicates that proteins u and v have a close interaction. This edge may only exist with a probability. To model such information, the uncertain graph model has been proposed, in which each edge e is augmented with a probability that indicates the chance e exists. Given a query node in the uncertain graph G, we study the k-NN query, which looks for k nodes in G whose distances from the query node are the shortest. To evaluate this query efficiently on uncertain graphs, we develop U-tree, a tree-based structure, which produces a compact representation of G. We also study a variant of the query, the k-ANN query, where multiple query nodes are inputted, e.g., the top-k nearest neighbors for a set of query nodes. To achieve good scalability, we design advanced query algorithms and a distributed solution based on the U-tree. The experimental studies show that our proposed methods provide both more accurate results and higher speedup compared to existing solutions. Second, motif-based analysis has attracted a lot of attention. A motif, or a small graph with a few nodes, is often considered as a fundamental unit of a graph. Motif-based analysis captures higher-order structure between nodes, and performs better than traditional ``edge-based'' solutions. For example, path-based solutions have been shown to be useful for various graph analysis tasks, such as link prediction and graph clustering. However, they are no longer adequate for handling complex and gigantic graphs; hence we study motif-path, which is conceptually a concatenation of one or more motif instances. We examine how motif-paths can be used in three path-based mining tasks, namely link prediction, local graph clustering and node ranking. We further address the situation when two graph nodes are not connected through a motif-path, and develop a novel defragmentation method to enhance it. Experimental results on real graph datasets demonstrate that the use of motif-paths and defragmentation techniques improves graph analysis effectiveness. Finally, we propose a GQL framework for mining knowledge graphs with motifs, called M-Cypher. It supports motif-related graph queries in an effective, efficient and user-friendly manner. We demonstrate the usage of the system on COVID-19 knowledge graph analysis.
DegreeDoctor of Philosophy
SubjectGraph algorithms
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/310210

 

DC FieldValueLanguage
dc.contributor.advisorCheng, CKR-
dc.contributor.authorLi, Xiaodong-
dc.contributor.author李晓東-
dc.date.accessioned2022-01-25T01:20:36Z-
dc.date.available2022-01-25T01:20:36Z-
dc.date.issued2020-
dc.identifier.citationLi, X. [李晓東]. (2020). Evaluating probabilistic and higher-order queries on big graphs. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/310210-
dc.description.abstractLarge graphs are prevalent in many areas, such as social networks, traffic networks, and biological networks. Although the topic has attracted a lot of interest from both research and industry communities, the efficiency and effectiveness issues of probabilistic and higher-order graph queries have not been well studied, especially for graphs of large sizes. In this thesis, we address four challenging problems regarding the probabilistic and higher-order queries on large graphs, i.e., (i) k-nearest neighbor (k-NN) query on uncertain graphs; (ii) k-aggregate nearest neighbor (k-ANN) query on uncertain graphs; (iii) motif path query on large graphs; (iv) graph query language (GQL) for motif-related queries on knowledge graphs. First, graphs are often inexact. For example, in a protein-protein interaction network, an edge between two nodes u and v indicates that proteins u and v have a close interaction. This edge may only exist with a probability. To model such information, the uncertain graph model has been proposed, in which each edge e is augmented with a probability that indicates the chance e exists. Given a query node in the uncertain graph G, we study the k-NN query, which looks for k nodes in G whose distances from the query node are the shortest. To evaluate this query efficiently on uncertain graphs, we develop U-tree, a tree-based structure, which produces a compact representation of G. We also study a variant of the query, the k-ANN query, where multiple query nodes are inputted, e.g., the top-k nearest neighbors for a set of query nodes. To achieve good scalability, we design advanced query algorithms and a distributed solution based on the U-tree. The experimental studies show that our proposed methods provide both more accurate results and higher speedup compared to existing solutions. Second, motif-based analysis has attracted a lot of attention. A motif, or a small graph with a few nodes, is often considered as a fundamental unit of a graph. Motif-based analysis captures higher-order structure between nodes, and performs better than traditional ``edge-based'' solutions. For example, path-based solutions have been shown to be useful for various graph analysis tasks, such as link prediction and graph clustering. However, they are no longer adequate for handling complex and gigantic graphs; hence we study motif-path, which is conceptually a concatenation of one or more motif instances. We examine how motif-paths can be used in three path-based mining tasks, namely link prediction, local graph clustering and node ranking. We further address the situation when two graph nodes are not connected through a motif-path, and develop a novel defragmentation method to enhance it. Experimental results on real graph datasets demonstrate that the use of motif-paths and defragmentation techniques improves graph analysis effectiveness. Finally, we propose a GQL framework for mining knowledge graphs with motifs, called M-Cypher. It supports motif-related graph queries in an effective, efficient and user-friendly manner. We demonstrate the usage of the system on COVID-19 knowledge graph analysis.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshGraph algorithms-
dc.titleEvaluating probabilistic and higher-order queries on big graphs-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2021-
dc.identifier.mmsid991044351382603414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats