File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Network motif discovery : efficiency, scalability, and benchmarking
| Title | Network motif discovery : efficiency, scalability, and benchmarking |
|---|---|
| Authors | |
| Advisors | Advisor(s):Cheng, CKR |
| Issue Date | 2024 |
| Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
| Citation | Najafi, M. M. [馬國成]. (2024). Network motif discovery : efficiency, scalability, and benchmarking. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
| Abstract | Network Motif Discovery (NMD) is a fundamental problem in network science, aimed at identifying small, recurring patterns of interconnections—motifs—within larger networks. These motifs are key to understanding the structural and functional properties of various complex systems, from biological networks to social and technological graphs. However, traditional approaches to motif discovery face significant computational challenges, particularly when applied to large-scale networks. In this thesis, we introduce MOSER, a novel framework for NMD that fundamentally redefines how motifs are discovered by leveraging a new significance testing procedure based on Markov Chains theory.
At the core of MOSER is a mathematically rigorous significance test for determining whether a subgraph occurs more frequently than expected in a random graph. Unlike traditional methods that rely on heuristic or computationally expensive random graph generation, MOSER’s significance test is grounded in Markov Chain Monte Carlo (MCMC) techniques, offering both theoretical soundness and practical efficiency. This framework not only provides more accurate motif discovery but also enables scalable optimizations in subgraph counting.
Building on this framework, we developed two dynamic counting algorithms: Track and Count (TAC) and Accelerated Track and Count (ATAC). These algorithms leverage incremental counting techniques to efficiently update subgraph counts during edge switches, significantly reducing computational overhead. TAC optimizes counting by limiting operations to a smaller portion of the graph, while ATAC accelerates the process further by directly tracking changes in subgraph frequencies, minimizing redundant calculations. These algorithms are designed to take full advantage of MOSER's theoretical foundation, ensuring that the counting process is both efficient and scalable.
Extensive experiments were conducted on both biological and data mining community datasets, demonstrating that MOSER and its optimized versions (MOSER$^{+}$ and MOSER$^{++}$) outperform existing state-of-the-art solutions such as Kavosh, G-Tries, QuateXelero (QX), and ACC-Motif. Notably, MOSER$^{++}$ achieves speedups of up to five orders of magnitude on large graphs, making it the fastest NMD solution available for large-scale networks.
Additionally, we showcase the practical utility of MOSER through a case study on Motif-aware Link Prediction in protein-protein interaction networks. The motifs discovered using MOSER significantly improve the accuracy of link prediction compared to those found by traditional subgraph counting methods, highlighting the framework’s real-world applicability.
To complement MOSER, we introduce XMAS, a benchmark designed to evaluate and compare subgraph counting methods. XMAS provides a standardized dataset with ground truth subgraph counts, a containerized framework for reproducibility, and a public leaderboard for performance comparison. It serves as a valuable tool for testing machine learning-based and traditional subgraph counting methods, facilitating fair and transparent evaluation.
In conclusion, MOSER introduces a novel, theoretically grounded framework for motif discovery, offering both improved accuracy and scalability. This framework opens the door to further optimizations in counting algorithms and presents new opportunities for applications in fields such as systems biology, social network analysis, and beyond. The introduction of the XMAS benchmark further ensures that subgraph counting methods can be evaluated consistently and transparently.
|
| Degree | Doctor of Philosophy |
| Subject | Graph algorithms |
| Dept/Program | Computer Science |
| Persistent Identifier | http://hdl.handle.net/10722/354715 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | Cheng, CKR | - |
| dc.contributor.author | Najafi, Mohammad Matin | - |
| dc.contributor.author | 馬國成 | - |
| dc.date.accessioned | 2025-03-04T09:30:50Z | - |
| dc.date.available | 2025-03-04T09:30:50Z | - |
| dc.date.issued | 2024 | - |
| dc.identifier.citation | Najafi, M. M. [馬國成]. (2024). Network motif discovery : efficiency, scalability, and benchmarking. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
| dc.identifier.uri | http://hdl.handle.net/10722/354715 | - |
| dc.description.abstract | Network Motif Discovery (NMD) is a fundamental problem in network science, aimed at identifying small, recurring patterns of interconnections—motifs—within larger networks. These motifs are key to understanding the structural and functional properties of various complex systems, from biological networks to social and technological graphs. However, traditional approaches to motif discovery face significant computational challenges, particularly when applied to large-scale networks. In this thesis, we introduce MOSER, a novel framework for NMD that fundamentally redefines how motifs are discovered by leveraging a new significance testing procedure based on Markov Chains theory. At the core of MOSER is a mathematically rigorous significance test for determining whether a subgraph occurs more frequently than expected in a random graph. Unlike traditional methods that rely on heuristic or computationally expensive random graph generation, MOSER’s significance test is grounded in Markov Chain Monte Carlo (MCMC) techniques, offering both theoretical soundness and practical efficiency. This framework not only provides more accurate motif discovery but also enables scalable optimizations in subgraph counting. Building on this framework, we developed two dynamic counting algorithms: Track and Count (TAC) and Accelerated Track and Count (ATAC). These algorithms leverage incremental counting techniques to efficiently update subgraph counts during edge switches, significantly reducing computational overhead. TAC optimizes counting by limiting operations to a smaller portion of the graph, while ATAC accelerates the process further by directly tracking changes in subgraph frequencies, minimizing redundant calculations. These algorithms are designed to take full advantage of MOSER's theoretical foundation, ensuring that the counting process is both efficient and scalable. Extensive experiments were conducted on both biological and data mining community datasets, demonstrating that MOSER and its optimized versions (MOSER$^{+}$ and MOSER$^{++}$) outperform existing state-of-the-art solutions such as Kavosh, G-Tries, QuateXelero (QX), and ACC-Motif. Notably, MOSER$^{++}$ achieves speedups of up to five orders of magnitude on large graphs, making it the fastest NMD solution available for large-scale networks. Additionally, we showcase the practical utility of MOSER through a case study on Motif-aware Link Prediction in protein-protein interaction networks. The motifs discovered using MOSER significantly improve the accuracy of link prediction compared to those found by traditional subgraph counting methods, highlighting the framework’s real-world applicability. To complement MOSER, we introduce XMAS, a benchmark designed to evaluate and compare subgraph counting methods. XMAS provides a standardized dataset with ground truth subgraph counts, a containerized framework for reproducibility, and a public leaderboard for performance comparison. It serves as a valuable tool for testing machine learning-based and traditional subgraph counting methods, facilitating fair and transparent evaluation. In conclusion, MOSER introduces a novel, theoretically grounded framework for motif discovery, offering both improved accuracy and scalability. This framework opens the door to further optimizations in counting algorithms and presents new opportunities for applications in fields such as systems biology, social network analysis, and beyond. The introduction of the XMAS benchmark further ensures that subgraph counting methods can be evaluated consistently and transparently. | - |
| dc.language | eng | - |
| dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
| dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
| dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject.lcsh | Graph algorithms | - |
| dc.title | Network motif discovery : efficiency, scalability, and benchmarking | - |
| dc.type | PG_Thesis | - |
| dc.description.thesisname | Doctor of Philosophy | - |
| dc.description.thesislevel | Doctoral | - |
| dc.description.thesisdiscipline | Computer Science | - |
| dc.description.nature | published_or_final_version | - |
| dc.date.hkucongregation | 2025 | - |
| dc.identifier.mmsid | 991044911104603414 | - |
