Hypergraph semi-supervised learning and efficient dynamic clustering

Hu, Shuguang; 胡曙光

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Hypergraph semi-supervised learning and efficient dynamic clustering

Title	Hypergraph semi-supervised learning and efficient dynamic clustering
Authors	Hu, Shuguang 胡曙光
Advisors	Advisor(s):Chan, HTH
Issue Date	2020
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Hu, S. [胡曙光]. (2020). Hypergraph semi-supervised learning and efficient dynamic clustering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Given a dataset, similarity relationship between examples can be represented by a graph in which each example is represented by a vertex. While pairwise relationship between two vertices can be represented by an edge in a normal graph, a higher order relationship involving multiple vertices can be captured by a hyperedge in hypergraph. In this thesis, we study three learning and clustering problems. First, we revisit semi-supervised learning on hypergraphs. Same as previous approaches, our method uses a convex program whose objective function is not everywhere differentiable. We exploit the non-uniqueness of the optimal solutions, and consider confidence intervals which give the exact ranges that unlabeled vertices take in any optimal solution. Moreover, we give a much simpler approach for solving the convex program based on the subgradient method. Our experiments on real-world datasets confirm that our confidence interval approach on hypergraphs outperforms existing methods, and our sub-gradient method gives faster running times when the number of vertices is much larger than the number of edges. Next, we consider the densest subgraph problem on hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. We present two exact algorithms and a near-linear time $r$-approximation algorithm for the problem, where $r$ is the maximum cardinality of an edge in the hypergraph. We also consider the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We present two dynamic approximation algorithms in this paper with amortized $poly( \frac{r}{\epsilon} \log n)$ update time, for any $\epsilon > 0$. For the case when there are only insertions, the approximation ratio we maintain is $r(1+\epsilon)$, while for the fully dynamic case, the ratio is $r^2 (1+ \epsilon )$. Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms. Finally, we develop a $(4 + \epsilon)$-approximation algorithm for the $k$-center clustering problem using linear space under the fully dynamic adversarial model. In such a model, points can be added or removed arbitrarily, provided that the adversary does not have access to the random choices of our algorithm. The amortized time cost of our algorithm is poly-logarithmic when the ratio between the maximum and minimum distance between any two points in input is bounded by a polynomial, while $k$ and $\epsilon$ are constant. Our theoretical results are complemented with an extensive experimental evaluation on dynamic datasets demonstrating the effectiveness of our approach.
Degree	Doctor of Philosophy
Subject	Cluster analysis - Data processing Data structures (Computer science) Database management Hypergraphs Supervised learning (Machine learning)
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/286011

DC Field	Value	Language
dc.contributor.advisor	Chan, HTH	-
dc.contributor.author	Hu, Shuguang	-
dc.contributor.author	胡曙光	-
dc.date.accessioned	2020-08-25T08:43:54Z	-
dc.date.available	2020-08-25T08:43:54Z	-
dc.date.issued	2020	-
dc.identifier.citation	Hu, S. [胡曙光]. (2020). Hypergraph semi-supervised learning and efficient dynamic clustering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/286011	-
dc.description.abstract	Given a dataset, similarity relationship between examples can be represented by a graph in which each example is represented by a vertex. While pairwise relationship between two vertices can be represented by an edge in a normal graph, a higher order relationship involving multiple vertices can be captured by a hyperedge in hypergraph. In this thesis, we study three learning and clustering problems. First, we revisit semi-supervised learning on hypergraphs. Same as previous approaches, our method uses a convex program whose objective function is not everywhere differentiable. We exploit the non-uniqueness of the optimal solutions, and consider confidence intervals which give the exact ranges that unlabeled vertices take in any optimal solution. Moreover, we give a much simpler approach for solving the convex program based on the subgradient method. Our experiments on real-world datasets confirm that our confidence interval approach on hypergraphs outperforms existing methods, and our sub-gradient method gives faster running times when the number of vertices is much larger than the number of edges. Next, we consider the densest subgraph problem on hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. We present two exact algorithms and a near-linear time $r$-approximation algorithm for the problem, where $r$ is the maximum cardinality of an edge in the hypergraph. We also consider the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We present two dynamic approximation algorithms in this paper with amortized $poly( \frac{r}{\epsilon} \log n)$ update time, for any $\epsilon > 0$. For the case when there are only insertions, the approximation ratio we maintain is $r(1+\epsilon)$, while for the fully dynamic case, the ratio is $r^2 (1+ \epsilon )$. Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms. Finally, we develop a $(4 + \epsilon)$-approximation algorithm for the $k$-center clustering problem using linear space under the fully dynamic adversarial model. In such a model, points can be added or removed arbitrarily, provided that the adversary does not have access to the random choices of our algorithm. The amortized time cost of our algorithm is poly-logarithmic when the ratio between the maximum and minimum distance between any two points in input is bounded by a polynomial, while $k$ and $\epsilon$ are constant. Our theoretical results are complemented with an extensive experimental evaluation on dynamic datasets demonstrating the effectiveness of our approach.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Cluster analysis - Data processing	-
dc.subject.lcsh	Data structures (Computer science)	-
dc.subject.lcsh	Database management	-
dc.subject.lcsh	Hypergraphs	-
dc.subject.lcsh	Supervised learning (Machine learning)	-
dc.title	Hypergraph semi-supervised learning and efficient dynamic clustering	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2020	-
dc.identifier.mmsid	991044264457403414	-

File Download

Supplementary

postgraduate thesis: Hypergraph semi-supervised learning and efficient dynamic clustering

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats