File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Hypergraph semi-supervised learning and efficient dynamic clustering
Title | Hypergraph semi-supervised learning and efficient dynamic clustering |
---|---|
Authors | |
Advisors | Advisor(s):Chan, HTH |
Issue Date | 2020 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Hu, S. [胡曙光]. (2020). Hypergraph semi-supervised learning and efficient dynamic clustering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Given a dataset, similarity relationship between
examples can be represented by a graph in which each example
is represented by a vertex.
While pairwise relationship between two vertices can be
represented by an edge in a normal graph,
a higher order relationship involving multiple vertices
can be captured by a hyperedge in hypergraph.
In this thesis, we study three learning and clustering problems.
First, we revisit semi-supervised learning on hypergraphs. Same as previous approaches, our
method uses a convex program whose objective
function is not everywhere differentiable. We
exploit the non-uniqueness of the optimal solutions, and consider confidence intervals which
give the exact ranges that unlabeled vertices take
in any optimal solution. Moreover, we give
a much simpler approach for solving the convex program based on the subgradient method.
Our experiments on real-world datasets confirm
that our confidence interval approach on hypergraphs outperforms existing methods, and our
sub-gradient method gives faster running times
when the number of vertices is much larger than
the number of edges.
Next, we consider the densest subgraph problem on hypergraphs, which generalizes the problem to a wider class of networks
in which edges might have different cardinalities and contain more
than 2 nodes. We present two exact algorithms and a near-linear
time $r$-approximation algorithm for the problem, where $r$ is the
maximum cardinality of an edge in the hypergraph. We also consider the dynamic version of the problem, in which an adversary
can insert or delete an edge from the hypergraph in each round and
the goal is to maintain efficiently an approximation of the densest
subgraph. We present two dynamic approximation algorithms in
this paper with amortized $poly( \frac{r}{\epsilon}
\log n)$ update time, for any $\epsilon > 0$.
For the case when there are only insertions, the approximation ratio
we maintain is $r(1+\epsilon)$, while for the fully dynamic case, the ratio is
$r^2 (1+ \epsilon )$. Extensive experiments are performed on large real datasets
to validate the effectiveness and efficiency of our algorithms.
Finally, we develop a $(4 + \epsilon)$-approximation algorithm for the $k$-center clustering problem using linear space under the fully dynamic
adversarial model. In such a model, points can be added or removed arbitrarily, provided that the adversary does not have access to
the random choices of our algorithm. The amortized time cost of our algorithm is poly-logarithmic when the ratio between the maximum and
minimum distance between any two points in input is bounded by a polynomial, while $k$ and $\epsilon$ are constant. Our theoretical results are complemented with an extensive experimental evaluation on dynamic datasets demonstrating the effectiveness of our approach.
|
Degree | Doctor of Philosophy |
Subject | Cluster analysis - Data processing Data structures (Computer science) Database management Hypergraphs Supervised learning (Machine learning) |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/286011 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Chan, HTH | - |
dc.contributor.author | Hu, Shuguang | - |
dc.contributor.author | 胡曙光 | - |
dc.date.accessioned | 2020-08-25T08:43:54Z | - |
dc.date.available | 2020-08-25T08:43:54Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Hu, S. [胡曙光]. (2020). Hypergraph semi-supervised learning and efficient dynamic clustering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/286011 | - |
dc.description.abstract | Given a dataset, similarity relationship between examples can be represented by a graph in which each example is represented by a vertex. While pairwise relationship between two vertices can be represented by an edge in a normal graph, a higher order relationship involving multiple vertices can be captured by a hyperedge in hypergraph. In this thesis, we study three learning and clustering problems. First, we revisit semi-supervised learning on hypergraphs. Same as previous approaches, our method uses a convex program whose objective function is not everywhere differentiable. We exploit the non-uniqueness of the optimal solutions, and consider confidence intervals which give the exact ranges that unlabeled vertices take in any optimal solution. Moreover, we give a much simpler approach for solving the convex program based on the subgradient method. Our experiments on real-world datasets confirm that our confidence interval approach on hypergraphs outperforms existing methods, and our sub-gradient method gives faster running times when the number of vertices is much larger than the number of edges. Next, we consider the densest subgraph problem on hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. We present two exact algorithms and a near-linear time $r$-approximation algorithm for the problem, where $r$ is the maximum cardinality of an edge in the hypergraph. We also consider the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We present two dynamic approximation algorithms in this paper with amortized $poly( \frac{r}{\epsilon} \log n)$ update time, for any $\epsilon > 0$. For the case when there are only insertions, the approximation ratio we maintain is $r(1+\epsilon)$, while for the fully dynamic case, the ratio is $r^2 (1+ \epsilon )$. Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms. Finally, we develop a $(4 + \epsilon)$-approximation algorithm for the $k$-center clustering problem using linear space under the fully dynamic adversarial model. In such a model, points can be added or removed arbitrarily, provided that the adversary does not have access to the random choices of our algorithm. The amortized time cost of our algorithm is poly-logarithmic when the ratio between the maximum and minimum distance between any two points in input is bounded by a polynomial, while $k$ and $\epsilon$ are constant. Our theoretical results are complemented with an extensive experimental evaluation on dynamic datasets demonstrating the effectiveness of our approach. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Cluster analysis - Data processing | - |
dc.subject.lcsh | Data structures (Computer science) | - |
dc.subject.lcsh | Database management | - |
dc.subject.lcsh | Hypergraphs | - |
dc.subject.lcsh | Supervised learning (Machine learning) | - |
dc.title | Hypergraph semi-supervised learning and efficient dynamic clustering | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2020 | - |
dc.identifier.mmsid | 991044264457403414 | - |