File Download
Supplementary

postgraduate thesis: Efficient and effective subgraph discovery

TitleEfficient and effective subgraph discovery
Authors
Advisors
Advisor(s):Cheng, CKR
Issue Date2021
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Ma, C. [馬晨昊]. (2021). Efficient and effective subgraph discovery. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractGraphs are widely used in different domains, e.g., biology and social science, to model the complex relationship among objects. Hence, graph related topics have attracted a lot of interest from both research and industry communities. Among those topics and applications, subgraph discovery is a key problem in terms of its theoretical and practical importance. In this thesis, we study two fundamental subgraph discovery problems over two different graph models: uncertain graphs, and directed graphs. First, we focus on counting motifs (i.e., small subgraphs) in uncertain graphs. In graph applications (e.g., biological and social networks), various analytics tasks (e.g., clustering and community search) are carried out to extract insight from large and complex graphs. Central to these tasks is the counting of the number of motifs, which are graphs with a few nodes. Recently, researchers have developed several fast motif counting algorithms. Most of these solutions assume that graphs are deterministic, i.e., the graph edges are certain to exist. However, due to measurement and statistical prediction errors, this assumption may not hold, and hence the analysis quality can be affected. To address this issue, we examine how to count motifs on uncertain graphs, whose edges only exist probabilistically. Particularly, we propose a solution framework that can be used by existing deterministic motif counting algorithms. We further propose an approximation algorithm. Extensive experiments on real datasets show that our algorithms are more effective and efficient than existing solutions. Next, we study the densest subgraph discovery problem on directed graphs. Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a three-thousand-edge graph, it takes three days for one of the best exact algorithms to complete. In this thesis, we develop an efficient and scalable DDS solution. We introduce the notion of [x, y]-core, which is a dense subgraph for G, and show that the densest subgraph can be accurately located through the [x, y]-core with theoretical guarantees. Based on the [x, y]-core, we develop exact and approximation algorithms. We further study the problems of maintaining the DDS over dynamic directed graphs and finding the weighted DDS on weighted directed graphs, and develop efficient non-trivial algorithms to solve these two problems by extending our DDS algorithms. We have performed an extensive evaluation of our approaches on fifteen real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art.
DegreeDoctor of Philosophy
SubjectUncertainty (Information theory) - Graphic methods
Directed graphs
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/306986

 

DC FieldValueLanguage
dc.contributor.advisorCheng, CKR-
dc.contributor.authorMa, Chenhao-
dc.contributor.author馬晨昊-
dc.date.accessioned2021-11-03T04:36:38Z-
dc.date.available2021-11-03T04:36:38Z-
dc.date.issued2021-
dc.identifier.citationMa, C. [馬晨昊]. (2021). Efficient and effective subgraph discovery. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/306986-
dc.description.abstractGraphs are widely used in different domains, e.g., biology and social science, to model the complex relationship among objects. Hence, graph related topics have attracted a lot of interest from both research and industry communities. Among those topics and applications, subgraph discovery is a key problem in terms of its theoretical and practical importance. In this thesis, we study two fundamental subgraph discovery problems over two different graph models: uncertain graphs, and directed graphs. First, we focus on counting motifs (i.e., small subgraphs) in uncertain graphs. In graph applications (e.g., biological and social networks), various analytics tasks (e.g., clustering and community search) are carried out to extract insight from large and complex graphs. Central to these tasks is the counting of the number of motifs, which are graphs with a few nodes. Recently, researchers have developed several fast motif counting algorithms. Most of these solutions assume that graphs are deterministic, i.e., the graph edges are certain to exist. However, due to measurement and statistical prediction errors, this assumption may not hold, and hence the analysis quality can be affected. To address this issue, we examine how to count motifs on uncertain graphs, whose edges only exist probabilistically. Particularly, we propose a solution framework that can be used by existing deterministic motif counting algorithms. We further propose an approximation algorithm. Extensive experiments on real datasets show that our algorithms are more effective and efficient than existing solutions. Next, we study the densest subgraph discovery problem on directed graphs. Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a three-thousand-edge graph, it takes three days for one of the best exact algorithms to complete. In this thesis, we develop an efficient and scalable DDS solution. We introduce the notion of [x, y]-core, which is a dense subgraph for G, and show that the densest subgraph can be accurately located through the [x, y]-core with theoretical guarantees. Based on the [x, y]-core, we develop exact and approximation algorithms. We further study the problems of maintaining the DDS over dynamic directed graphs and finding the weighted DDS on weighted directed graphs, and develop efficient non-trivial algorithms to solve these two problems by extending our DDS algorithms. We have performed an extensive evaluation of our approaches on fifteen real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshUncertainty (Information theory) - Graphic methods-
dc.subject.lcshDirected graphs-
dc.titleEfficient and effective subgraph discovery-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2021-
dc.identifier.mmsid991044437613103414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats