File Download
Supplementary

postgraduate thesis: Network motif discovery : efficiency, scalability, and benchmarking

TitleNetwork motif discovery : efficiency, scalability, and benchmarking
Authors
Advisors
Advisor(s):Cheng, CKR
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Najafi, M. M. [馬國成]. (2024). Network motif discovery : efficiency, scalability, and benchmarking. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractNetwork Motif Discovery (NMD) is a fundamental problem in network science, aimed at identifying small, recurring patterns of interconnections—motifs—within larger networks. These motifs are key to understanding the structural and functional properties of various complex systems, from biological networks to social and technological graphs. However, traditional approaches to motif discovery face significant computational challenges, particularly when applied to large-scale networks. In this thesis, we introduce MOSER, a novel framework for NMD that fundamentally redefines how motifs are discovered by leveraging a new significance testing procedure based on Markov Chains theory. At the core of MOSER is a mathematically rigorous significance test for determining whether a subgraph occurs more frequently than expected in a random graph. Unlike traditional methods that rely on heuristic or computationally expensive random graph generation, MOSER’s significance test is grounded in Markov Chain Monte Carlo (MCMC) techniques, offering both theoretical soundness and practical efficiency. This framework not only provides more accurate motif discovery but also enables scalable optimizations in subgraph counting. Building on this framework, we developed two dynamic counting algorithms: Track and Count (TAC) and Accelerated Track and Count (ATAC). These algorithms leverage incremental counting techniques to efficiently update subgraph counts during edge switches, significantly reducing computational overhead. TAC optimizes counting by limiting operations to a smaller portion of the graph, while ATAC accelerates the process further by directly tracking changes in subgraph frequencies, minimizing redundant calculations. These algorithms are designed to take full advantage of MOSER's theoretical foundation, ensuring that the counting process is both efficient and scalable. Extensive experiments were conducted on both biological and data mining community datasets, demonstrating that MOSER and its optimized versions (MOSER$^{+}$ and MOSER$^{++}$) outperform existing state-of-the-art solutions such as Kavosh, G-Tries, QuateXelero (QX), and ACC-Motif. Notably, MOSER$^{++}$ achieves speedups of up to five orders of magnitude on large graphs, making it the fastest NMD solution available for large-scale networks. Additionally, we showcase the practical utility of MOSER through a case study on Motif-aware Link Prediction in protein-protein interaction networks. The motifs discovered using MOSER significantly improve the accuracy of link prediction compared to those found by traditional subgraph counting methods, highlighting the framework’s real-world applicability. To complement MOSER, we introduce XMAS, a benchmark designed to evaluate and compare subgraph counting methods. XMAS provides a standardized dataset with ground truth subgraph counts, a containerized framework for reproducibility, and a public leaderboard for performance comparison. It serves as a valuable tool for testing machine learning-based and traditional subgraph counting methods, facilitating fair and transparent evaluation. In conclusion, MOSER introduces a novel, theoretically grounded framework for motif discovery, offering both improved accuracy and scalability. This framework opens the door to further optimizations in counting algorithms and presents new opportunities for applications in fields such as systems biology, social network analysis, and beyond. The introduction of the XMAS benchmark further ensures that subgraph counting methods can be evaluated consistently and transparently.
DegreeDoctor of Philosophy
SubjectGraph algorithms
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/354715

 

DC FieldValueLanguage
dc.contributor.advisorCheng, CKR-
dc.contributor.authorNajafi, Mohammad Matin-
dc.contributor.author馬國成-
dc.date.accessioned2025-03-04T09:30:50Z-
dc.date.available2025-03-04T09:30:50Z-
dc.date.issued2024-
dc.identifier.citationNajafi, M. M. [馬國成]. (2024). Network motif discovery : efficiency, scalability, and benchmarking. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/354715-
dc.description.abstractNetwork Motif Discovery (NMD) is a fundamental problem in network science, aimed at identifying small, recurring patterns of interconnections—motifs—within larger networks. These motifs are key to understanding the structural and functional properties of various complex systems, from biological networks to social and technological graphs. However, traditional approaches to motif discovery face significant computational challenges, particularly when applied to large-scale networks. In this thesis, we introduce MOSER, a novel framework for NMD that fundamentally redefines how motifs are discovered by leveraging a new significance testing procedure based on Markov Chains theory. At the core of MOSER is a mathematically rigorous significance test for determining whether a subgraph occurs more frequently than expected in a random graph. Unlike traditional methods that rely on heuristic or computationally expensive random graph generation, MOSER’s significance test is grounded in Markov Chain Monte Carlo (MCMC) techniques, offering both theoretical soundness and practical efficiency. This framework not only provides more accurate motif discovery but also enables scalable optimizations in subgraph counting. Building on this framework, we developed two dynamic counting algorithms: Track and Count (TAC) and Accelerated Track and Count (ATAC). These algorithms leverage incremental counting techniques to efficiently update subgraph counts during edge switches, significantly reducing computational overhead. TAC optimizes counting by limiting operations to a smaller portion of the graph, while ATAC accelerates the process further by directly tracking changes in subgraph frequencies, minimizing redundant calculations. These algorithms are designed to take full advantage of MOSER's theoretical foundation, ensuring that the counting process is both efficient and scalable. Extensive experiments were conducted on both biological and data mining community datasets, demonstrating that MOSER and its optimized versions (MOSER$^{+}$ and MOSER$^{++}$) outperform existing state-of-the-art solutions such as Kavosh, G-Tries, QuateXelero (QX), and ACC-Motif. Notably, MOSER$^{++}$ achieves speedups of up to five orders of magnitude on large graphs, making it the fastest NMD solution available for large-scale networks. Additionally, we showcase the practical utility of MOSER through a case study on Motif-aware Link Prediction in protein-protein interaction networks. The motifs discovered using MOSER significantly improve the accuracy of link prediction compared to those found by traditional subgraph counting methods, highlighting the framework’s real-world applicability. To complement MOSER, we introduce XMAS, a benchmark designed to evaluate and compare subgraph counting methods. XMAS provides a standardized dataset with ground truth subgraph counts, a containerized framework for reproducibility, and a public leaderboard for performance comparison. It serves as a valuable tool for testing machine learning-based and traditional subgraph counting methods, facilitating fair and transparent evaluation. In conclusion, MOSER introduces a novel, theoretically grounded framework for motif discovery, offering both improved accuracy and scalability. This framework opens the door to further optimizations in counting algorithms and presents new opportunities for applications in fields such as systems biology, social network analysis, and beyond. The introduction of the XMAS benchmark further ensures that subgraph counting methods can be evaluated consistently and transparently. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshGraph algorithms-
dc.titleNetwork motif discovery : efficiency, scalability, and benchmarking-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2025-
dc.identifier.mmsid991044911104603414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats