File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Efficient and effective subgraph discovery
Title | Efficient and effective subgraph discovery |
---|---|
Authors | |
Advisors | Advisor(s):Cheng, CKR |
Issue Date | 2021 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Ma, C. [馬晨昊]. (2021). Efficient and effective subgraph discovery. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Graphs are widely used in different domains, e.g., biology and social science, to model the complex relationship among objects. Hence, graph related topics have attracted a lot of interest from both research and industry communities. Among those topics and applications, subgraph discovery is a key problem in terms of its theoretical and practical importance. In this thesis, we study two fundamental subgraph discovery problems over two different graph models: uncertain graphs, and directed graphs.
First, we focus on counting motifs (i.e., small subgraphs) in uncertain graphs. In graph applications (e.g., biological and social networks), various analytics tasks (e.g., clustering and community search) are carried out to extract insight from large and complex graphs. Central to these tasks is the counting of the number of motifs, which are graphs with a few nodes. Recently, researchers have developed several fast motif counting algorithms. Most of these solutions assume that graphs are deterministic, i.e., the graph edges are certain to exist. However, due to measurement and statistical prediction errors, this assumption may not hold, and hence the analysis quality can be affected. To address this issue, we examine how to count motifs on uncertain graphs, whose edges only exist probabilistically. Particularly, we propose a solution framework that can be used by existing deterministic motif counting algorithms. We further propose an approximation algorithm. Extensive experiments on real datasets show that our algorithms are more effective and efficient than existing solutions.
Next, we study the densest subgraph discovery problem on directed graphs. Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a three-thousand-edge graph, it takes three days for one of the best exact algorithms to complete. In this thesis, we develop an efficient and scalable DDS solution. We introduce the notion of [x, y]-core, which is a dense subgraph for G, and show that the densest subgraph can be accurately located through the [x, y]-core with theoretical guarantees. Based on the [x, y]-core, we develop exact and approximation algorithms. We further study the problems of maintaining the DDS over dynamic directed graphs and finding the weighted DDS on weighted directed graphs, and develop efficient non-trivial algorithms to solve these two problems by extending our DDS algorithms. We have performed an extensive evaluation of our approaches on fifteen real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art. |
Degree | Doctor of Philosophy |
Subject | Uncertainty (Information theory) - Graphic methods Directed graphs |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/306986 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Cheng, CKR | - |
dc.contributor.author | Ma, Chenhao | - |
dc.contributor.author | 馬晨昊 | - |
dc.date.accessioned | 2021-11-03T04:36:38Z | - |
dc.date.available | 2021-11-03T04:36:38Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Ma, C. [馬晨昊]. (2021). Efficient and effective subgraph discovery. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/306986 | - |
dc.description.abstract | Graphs are widely used in different domains, e.g., biology and social science, to model the complex relationship among objects. Hence, graph related topics have attracted a lot of interest from both research and industry communities. Among those topics and applications, subgraph discovery is a key problem in terms of its theoretical and practical importance. In this thesis, we study two fundamental subgraph discovery problems over two different graph models: uncertain graphs, and directed graphs. First, we focus on counting motifs (i.e., small subgraphs) in uncertain graphs. In graph applications (e.g., biological and social networks), various analytics tasks (e.g., clustering and community search) are carried out to extract insight from large and complex graphs. Central to these tasks is the counting of the number of motifs, which are graphs with a few nodes. Recently, researchers have developed several fast motif counting algorithms. Most of these solutions assume that graphs are deterministic, i.e., the graph edges are certain to exist. However, due to measurement and statistical prediction errors, this assumption may not hold, and hence the analysis quality can be affected. To address this issue, we examine how to count motifs on uncertain graphs, whose edges only exist probabilistically. Particularly, we propose a solution framework that can be used by existing deterministic motif counting algorithms. We further propose an approximation algorithm. Extensive experiments on real datasets show that our algorithms are more effective and efficient than existing solutions. Next, we study the densest subgraph discovery problem on directed graphs. Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a three-thousand-edge graph, it takes three days for one of the best exact algorithms to complete. In this thesis, we develop an efficient and scalable DDS solution. We introduce the notion of [x, y]-core, which is a dense subgraph for G, and show that the densest subgraph can be accurately located through the [x, y]-core with theoretical guarantees. Based on the [x, y]-core, we develop exact and approximation algorithms. We further study the problems of maintaining the DDS over dynamic directed graphs and finding the weighted DDS on weighted directed graphs, and develop efficient non-trivial algorithms to solve these two problems by extending our DDS algorithms. We have performed an extensive evaluation of our approaches on fifteen real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Uncertainty (Information theory) - Graphic methods | - |
dc.subject.lcsh | Directed graphs | - |
dc.title | Efficient and effective subgraph discovery | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2021 | - |
dc.identifier.mmsid | 991044437613103414 | - |