File Download
Supplementary

postgraduate thesis: Hypergraph semi-supervised learning and efficient dynamic clustering

TitleHypergraph semi-supervised learning and efficient dynamic clustering
Authors
Advisors
Advisor(s):Chan, HTH
Issue Date2020
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Hu, S. [胡曙光]. (2020). Hypergraph semi-supervised learning and efficient dynamic clustering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractGiven a dataset, similarity relationship between examples can be represented by a graph in which each example is represented by a vertex. While pairwise relationship between two vertices can be represented by an edge in a normal graph, a higher order relationship involving multiple vertices can be captured by a hyperedge in hypergraph. In this thesis, we study three learning and clustering problems. First, we revisit semi-supervised learning on hypergraphs. Same as previous approaches, our method uses a convex program whose objective function is not everywhere differentiable. We exploit the non-uniqueness of the optimal solutions, and consider confidence intervals which give the exact ranges that unlabeled vertices take in any optimal solution. Moreover, we give a much simpler approach for solving the convex program based on the subgradient method. Our experiments on real-world datasets confirm that our confidence interval approach on hypergraphs outperforms existing methods, and our sub-gradient method gives faster running times when the number of vertices is much larger than the number of edges. Next, we consider the densest subgraph problem on hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. We present two exact algorithms and a near-linear time $r$-approximation algorithm for the problem, where $r$ is the maximum cardinality of an edge in the hypergraph. We also consider the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We present two dynamic approximation algorithms in this paper with amortized $poly( \frac{r}{\epsilon} \log n)$ update time, for any $\epsilon > 0$. For the case when there are only insertions, the approximation ratio we maintain is $r(1+\epsilon)$, while for the fully dynamic case, the ratio is $r^2 (1+ \epsilon )$. Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms. Finally, we develop a $(4 + \epsilon)$-approximation algorithm for the $k$-center clustering problem using linear space under the fully dynamic adversarial model. In such a model, points can be added or removed arbitrarily, provided that the adversary does not have access to the random choices of our algorithm. The amortized time cost of our algorithm is poly-logarithmic when the ratio between the maximum and minimum distance between any two points in input is bounded by a polynomial, while $k$ and $\epsilon$ are constant. Our theoretical results are complemented with an extensive experimental evaluation on dynamic datasets demonstrating the effectiveness of our approach.
DegreeDoctor of Philosophy
SubjectCluster analysis - Data processing
Data structures (Computer science)
Database management
Hypergraphs
Supervised learning (Machine learning)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/286011

 

DC FieldValueLanguage
dc.contributor.advisorChan, HTH-
dc.contributor.authorHu, Shuguang-
dc.contributor.author胡曙光-
dc.date.accessioned2020-08-25T08:43:54Z-
dc.date.available2020-08-25T08:43:54Z-
dc.date.issued2020-
dc.identifier.citationHu, S. [胡曙光]. (2020). Hypergraph semi-supervised learning and efficient dynamic clustering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/286011-
dc.description.abstractGiven a dataset, similarity relationship between examples can be represented by a graph in which each example is represented by a vertex. While pairwise relationship between two vertices can be represented by an edge in a normal graph, a higher order relationship involving multiple vertices can be captured by a hyperedge in hypergraph. In this thesis, we study three learning and clustering problems. First, we revisit semi-supervised learning on hypergraphs. Same as previous approaches, our method uses a convex program whose objective function is not everywhere differentiable. We exploit the non-uniqueness of the optimal solutions, and consider confidence intervals which give the exact ranges that unlabeled vertices take in any optimal solution. Moreover, we give a much simpler approach for solving the convex program based on the subgradient method. Our experiments on real-world datasets confirm that our confidence interval approach on hypergraphs outperforms existing methods, and our sub-gradient method gives faster running times when the number of vertices is much larger than the number of edges. Next, we consider the densest subgraph problem on hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. We present two exact algorithms and a near-linear time $r$-approximation algorithm for the problem, where $r$ is the maximum cardinality of an edge in the hypergraph. We also consider the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We present two dynamic approximation algorithms in this paper with amortized $poly( \frac{r}{\epsilon} \log n)$ update time, for any $\epsilon > 0$. For the case when there are only insertions, the approximation ratio we maintain is $r(1+\epsilon)$, while for the fully dynamic case, the ratio is $r^2 (1+ \epsilon )$. Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms. Finally, we develop a $(4 + \epsilon)$-approximation algorithm for the $k$-center clustering problem using linear space under the fully dynamic adversarial model. In such a model, points can be added or removed arbitrarily, provided that the adversary does not have access to the random choices of our algorithm. The amortized time cost of our algorithm is poly-logarithmic when the ratio between the maximum and minimum distance between any two points in input is bounded by a polynomial, while $k$ and $\epsilon$ are constant. Our theoretical results are complemented with an extensive experimental evaluation on dynamic datasets demonstrating the effectiveness of our approach. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshCluster analysis - Data processing-
dc.subject.lcshData structures (Computer science)-
dc.subject.lcshDatabase management-
dc.subject.lcshHypergraphs-
dc.subject.lcshSupervised learning (Machine learning)-
dc.titleHypergraph semi-supervised learning and efficient dynamic clustering-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2020-
dc.identifier.mmsid991044264457403414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats