Network motif discovery : efficiency, scalability, and benchmarking

Najafi, Mohammad Matin; 馬國成

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Network motif discovery : efficiency, scalability, and benchmarking

Title	Network motif discovery : efficiency, scalability, and benchmarking
Authors	Najafi, Mohammad Matin 馬國成
Advisors	Advisor(s):Cheng, CKR
Issue Date	2024
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Najafi, M. M. [馬國成]. (2024). Network motif discovery : efficiency, scalability, and benchmarking. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Network Motif Discovery (NMD) is a fundamental problem in network science, aimed at identifying small, recurring patterns of interconnections—motifs—within larger networks. These motifs are key to understanding the structural and functional properties of various complex systems, from biological networks to social and technological graphs. However, traditional approaches to motif discovery face significant computational challenges, particularly when applied to large-scale networks. In this thesis, we introduce MOSER, a novel framework for NMD that fundamentally redefines how motifs are discovered by leveraging a new significance testing procedure based on Markov Chains theory. At the core of MOSER is a mathematically rigorous significance test for determining whether a subgraph occurs more frequently than expected in a random graph. Unlike traditional methods that rely on heuristic or computationally expensive random graph generation, MOSER’s significance test is grounded in Markov Chain Monte Carlo (MCMC) techniques, offering both theoretical soundness and practical efficiency. This framework not only provides more accurate motif discovery but also enables scalable optimizations in subgraph counting. Building on this framework, we developed two dynamic counting algorithms: Track and Count (TAC) and Accelerated Track and Count (ATAC). These algorithms leverage incremental counting techniques to efficiently update subgraph counts during edge switches, significantly reducing computational overhead. TAC optimizes counting by limiting operations to a smaller portion of the graph, while ATAC accelerates the process further by directly tracking changes in subgraph frequencies, minimizing redundant calculations. These algorithms are designed to take full advantage of MOSER's theoretical foundation, ensuring that the counting process is both efficient and scalable. Extensive experiments were conducted on both biological and data mining community datasets, demonstrating that MOSER and its optimized versions (MOSER$^{+}$ and MOSER$^{++}$) outperform existing state-of-the-art solutions such as Kavosh, G-Tries, QuateXelero (QX), and ACC-Motif. Notably, MOSER$^{++}$ achieves speedups of up to five orders of magnitude on large graphs, making it the fastest NMD solution available for large-scale networks. Additionally, we showcase the practical utility of MOSER through a case study on Motif-aware Link Prediction in protein-protein interaction networks. The motifs discovered using MOSER significantly improve the accuracy of link prediction compared to those found by traditional subgraph counting methods, highlighting the framework’s real-world applicability. To complement MOSER, we introduce XMAS, a benchmark designed to evaluate and compare subgraph counting methods. XMAS provides a standardized dataset with ground truth subgraph counts, a containerized framework for reproducibility, and a public leaderboard for performance comparison. It serves as a valuable tool for testing machine learning-based and traditional subgraph counting methods, facilitating fair and transparent evaluation. In conclusion, MOSER introduces a novel, theoretically grounded framework for motif discovery, offering both improved accuracy and scalability. This framework opens the door to further optimizations in counting algorithms and presents new opportunities for applications in fields such as systems biology, social network analysis, and beyond. The introduction of the XMAS benchmark further ensures that subgraph counting methods can be evaluated consistently and transparently.
Degree	Doctor of Philosophy
Subject	Graph algorithms
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/354715

DC Field	Value	Language
dc.contributor.advisor	Cheng, CKR	-
dc.contributor.author	Najafi, Mohammad Matin	-
dc.contributor.author	馬國成	-
dc.date.accessioned	2025-03-04T09:30:50Z	-
dc.date.available	2025-03-04T09:30:50Z	-
dc.date.issued	2024	-
dc.identifier.citation	Najafi, M. M. [馬國成]. (2024). Network motif discovery : efficiency, scalability, and benchmarking. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/354715	-
dc.description.abstract	Network Motif Discovery (NMD) is a fundamental problem in network science, aimed at identifying small, recurring patterns of interconnections—motifs—within larger networks. These motifs are key to understanding the structural and functional properties of various complex systems, from biological networks to social and technological graphs. However, traditional approaches to motif discovery face significant computational challenges, particularly when applied to large-scale networks. In this thesis, we introduce MOSER, a novel framework for NMD that fundamentally redefines how motifs are discovered by leveraging a new significance testing procedure based on Markov Chains theory. At the core of MOSER is a mathematically rigorous significance test for determining whether a subgraph occurs more frequently than expected in a random graph. Unlike traditional methods that rely on heuristic or computationally expensive random graph generation, MOSER’s significance test is grounded in Markov Chain Monte Carlo (MCMC) techniques, offering both theoretical soundness and practical efficiency. This framework not only provides more accurate motif discovery but also enables scalable optimizations in subgraph counting. Building on this framework, we developed two dynamic counting algorithms: Track and Count (TAC) and Accelerated Track and Count (ATAC). These algorithms leverage incremental counting techniques to efficiently update subgraph counts during edge switches, significantly reducing computational overhead. TAC optimizes counting by limiting operations to a smaller portion of the graph, while ATAC accelerates the process further by directly tracking changes in subgraph frequencies, minimizing redundant calculations. These algorithms are designed to take full advantage of MOSER's theoretical foundation, ensuring that the counting process is both efficient and scalable. Extensive experiments were conducted on both biological and data mining community datasets, demonstrating that MOSER and its optimized versions (MOSER$^{+}$ and MOSER$^{++}$) outperform existing state-of-the-art solutions such as Kavosh, G-Tries, QuateXelero (QX), and ACC-Motif. Notably, MOSER$^{++}$ achieves speedups of up to five orders of magnitude on large graphs, making it the fastest NMD solution available for large-scale networks. Additionally, we showcase the practical utility of MOSER through a case study on Motif-aware Link Prediction in protein-protein interaction networks. The motifs discovered using MOSER significantly improve the accuracy of link prediction compared to those found by traditional subgraph counting methods, highlighting the framework’s real-world applicability. To complement MOSER, we introduce XMAS, a benchmark designed to evaluate and compare subgraph counting methods. XMAS provides a standardized dataset with ground truth subgraph counts, a containerized framework for reproducibility, and a public leaderboard for performance comparison. It serves as a valuable tool for testing machine learning-based and traditional subgraph counting methods, facilitating fair and transparent evaluation. In conclusion, MOSER introduces a novel, theoretically grounded framework for motif discovery, offering both improved accuracy and scalability. This framework opens the door to further optimizations in counting algorithms and presents new opportunities for applications in fields such as systems biology, social network analysis, and beyond. The introduction of the XMAS benchmark further ensures that subgraph counting methods can be evaluated consistently and transparently.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Graph algorithms	-
dc.title	Network motif discovery : efficiency, scalability, and benchmarking	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2025	-
dc.identifier.mmsid	991044911104603414	-

File Download

Supplementary

postgraduate thesis: Network motif discovery : efficiency, scalability, and benchmarking

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats