File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Classification and clustering in heterogeneous information networks
Title | Classification and clustering in heterogeneous information networks |
---|---|
Authors | |
Advisors | Advisor(s):Kao, CM |
Issue Date | 2018 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Li, X. [李翔]. (2018). Classification and clustering in heterogeneous information networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | A heterogeneous information network (HIN) is one whose nodes model objects of different types and whose links model objects’ relationships. To enrich its information content, objects (as represented by nodes) in an HIN are typically associated with additional attributes. We call such an HIN an Attributed HIN or AHIN.
Classification and clustering are fundamental tasks in data analytics, which have found interesting applications in HINs. In this thesis, we study the problems of classification and clustering in HINs.
First, we study transductive classification in HINs. We identify two fundamental properties, namely, cohesiveness and connectedness, of an HIN that greatly influence the effectiveness of transductive classifiers. We define metrics that measure the two properties. Through experiments, we show that the two properties serve as very effective indicators that predict the accuracy of transductive classifiers. Based on cohesiveness and connectedness we derive (1) a black-box tester that evaluates whether transductive classifiers should be applied for a given classification task and (2) an active learning algorithm that identifies the objects in an HIN whose labels should be sought in order to improve classification accuracy.
Second, we study how spectral clustering can be effectively applied to HINs. In particular, we focus on how meta-path relations are used to construct an effective similarity matrix based on which spectral clustering is done. We formulate the similarity matrix construction as an optimization problem and propose the SClump algorithm for solving the problem. We conduct extensive experiments comparing SClump with other state-of-the-art clustering algorithms on HINs. Our results show that SClump outperforms the competitors over a range of datasets w.r.t. different clustering quality measures.
Third, we study the problem of clustering objects in an AHIN, taking into account objects’ similarities with respect to both object attribute values and their structural connectedness in the network. We show how supervision signal, expressed in the form of a must-link set and a cannot-link set, can be leveraged to improve clustering results. We put forward the SCHAIN algorithm to solve the clustering problem. We conduct extensive experiments comparing SCHAIN with other state-of-the-art clustering algorithms and show that SCHAIN outperforms the others in clustering quality. |
Degree | Doctor of Philosophy |
Subject | Heterogeneous distributed computing systems |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/263171 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Kao, CM | - |
dc.contributor.author | Li, Xiang | - |
dc.contributor.author | 李翔 | - |
dc.date.accessioned | 2018-10-16T07:34:51Z | - |
dc.date.available | 2018-10-16T07:34:51Z | - |
dc.date.issued | 2018 | - |
dc.identifier.citation | Li, X. [李翔]. (2018). Classification and clustering in heterogeneous information networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/263171 | - |
dc.description.abstract | A heterogeneous information network (HIN) is one whose nodes model objects of different types and whose links model objects’ relationships. To enrich its information content, objects (as represented by nodes) in an HIN are typically associated with additional attributes. We call such an HIN an Attributed HIN or AHIN. Classification and clustering are fundamental tasks in data analytics, which have found interesting applications in HINs. In this thesis, we study the problems of classification and clustering in HINs. First, we study transductive classification in HINs. We identify two fundamental properties, namely, cohesiveness and connectedness, of an HIN that greatly influence the effectiveness of transductive classifiers. We define metrics that measure the two properties. Through experiments, we show that the two properties serve as very effective indicators that predict the accuracy of transductive classifiers. Based on cohesiveness and connectedness we derive (1) a black-box tester that evaluates whether transductive classifiers should be applied for a given classification task and (2) an active learning algorithm that identifies the objects in an HIN whose labels should be sought in order to improve classification accuracy. Second, we study how spectral clustering can be effectively applied to HINs. In particular, we focus on how meta-path relations are used to construct an effective similarity matrix based on which spectral clustering is done. We formulate the similarity matrix construction as an optimization problem and propose the SClump algorithm for solving the problem. We conduct extensive experiments comparing SClump with other state-of-the-art clustering algorithms on HINs. Our results show that SClump outperforms the competitors over a range of datasets w.r.t. different clustering quality measures. Third, we study the problem of clustering objects in an AHIN, taking into account objects’ similarities with respect to both object attribute values and their structural connectedness in the network. We show how supervision signal, expressed in the form of a must-link set and a cannot-link set, can be leveraged to improve clustering results. We put forward the SCHAIN algorithm to solve the clustering problem. We conduct extensive experiments comparing SCHAIN with other state-of-the-art clustering algorithms and show that SCHAIN outperforms the others in clustering quality. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Heterogeneous distributed computing systems | - |
dc.title | Classification and clustering in heterogeneous information networks | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991044046695703414 | - |
dc.date.hkucongregation | 2018 | - |
dc.identifier.mmsid | 991044046695703414 | - |