File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Classification and clustering in heterogeneous information networks

TitleClassification and clustering in heterogeneous information networks
Authors
Advisors
Advisor(s):Kao, CM
Issue Date2018
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Li, X. [李翔]. (2018). Classification and clustering in heterogeneous information networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractA heterogeneous information network (HIN) is one whose nodes model objects of different types and whose links model objects’ relationships. To enrich its information content, objects (as represented by nodes) in an HIN are typically associated with additional attributes. We call such an HIN an Attributed HIN or AHIN. Classification and clustering are fundamental tasks in data analytics, which have found interesting applications in HINs. In this thesis, we study the problems of classification and clustering in HINs. First, we study transductive classification in HINs. We identify two fundamental properties, namely, cohesiveness and connectedness, of an HIN that greatly influence the effectiveness of transductive classifiers. We define metrics that measure the two properties. Through experiments, we show that the two properties serve as very effective indicators that predict the accuracy of transductive classifiers. Based on cohesiveness and connectedness we derive (1) a black-box tester that evaluates whether transductive classifiers should be applied for a given classification task and (2) an active learning algorithm that identifies the objects in an HIN whose labels should be sought in order to improve classification accuracy. Second, we study how spectral clustering can be effectively applied to HINs. In particular, we focus on how meta-path relations are used to construct an effective similarity matrix based on which spectral clustering is done. We formulate the similarity matrix construction as an optimization problem and propose the SClump algorithm for solving the problem. We conduct extensive experiments comparing SClump with other state-of-the-art clustering algorithms on HINs. Our results show that SClump outperforms the competitors over a range of datasets w.r.t. different clustering quality measures. Third, we study the problem of clustering objects in an AHIN, taking into account objects’ similarities with respect to both object attribute values and their structural connectedness in the network. We show how supervision signal, expressed in the form of a must-link set and a cannot-link set, can be leveraged to improve clustering results. We put forward the SCHAIN algorithm to solve the clustering problem. We conduct extensive experiments comparing SCHAIN with other state-of-the-art clustering algorithms and show that SCHAIN outperforms the others in clustering quality.
DegreeDoctor of Philosophy
SubjectHeterogeneous distributed computing systems
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/263171

 

DC FieldValueLanguage
dc.contributor.advisorKao, CM-
dc.contributor.authorLi, Xiang-
dc.contributor.author李翔-
dc.date.accessioned2018-10-16T07:34:51Z-
dc.date.available2018-10-16T07:34:51Z-
dc.date.issued2018-
dc.identifier.citationLi, X. [李翔]. (2018). Classification and clustering in heterogeneous information networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/263171-
dc.description.abstractA heterogeneous information network (HIN) is one whose nodes model objects of different types and whose links model objects’ relationships. To enrich its information content, objects (as represented by nodes) in an HIN are typically associated with additional attributes. We call such an HIN an Attributed HIN or AHIN. Classification and clustering are fundamental tasks in data analytics, which have found interesting applications in HINs. In this thesis, we study the problems of classification and clustering in HINs. First, we study transductive classification in HINs. We identify two fundamental properties, namely, cohesiveness and connectedness, of an HIN that greatly influence the effectiveness of transductive classifiers. We define metrics that measure the two properties. Through experiments, we show that the two properties serve as very effective indicators that predict the accuracy of transductive classifiers. Based on cohesiveness and connectedness we derive (1) a black-box tester that evaluates whether transductive classifiers should be applied for a given classification task and (2) an active learning algorithm that identifies the objects in an HIN whose labels should be sought in order to improve classification accuracy. Second, we study how spectral clustering can be effectively applied to HINs. In particular, we focus on how meta-path relations are used to construct an effective similarity matrix based on which spectral clustering is done. We formulate the similarity matrix construction as an optimization problem and propose the SClump algorithm for solving the problem. We conduct extensive experiments comparing SClump with other state-of-the-art clustering algorithms on HINs. Our results show that SClump outperforms the competitors over a range of datasets w.r.t. different clustering quality measures. Third, we study the problem of clustering objects in an AHIN, taking into account objects’ similarities with respect to both object attribute values and their structural connectedness in the network. We show how supervision signal, expressed in the form of a must-link set and a cannot-link set, can be leveraged to improve clustering results. We put forward the SCHAIN algorithm to solve the clustering problem. We conduct extensive experiments comparing SCHAIN with other state-of-the-art clustering algorithms and show that SCHAIN outperforms the others in clustering quality.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshHeterogeneous distributed computing systems-
dc.titleClassification and clustering in heterogeneous information networks-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_991044046695703414-
dc.date.hkucongregation2018-
dc.identifier.mmsid991044046695703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats