File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Mining heterogeneous information networks

TitleMining heterogeneous information networks
Authors
Advisors
Issue Date2019
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Huang, Z. [黄智鹏]. (2019). Mining heterogeneous information networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractHeterogeneous information networks (HINs), such as DBLP, YAGO, DBpedia and Freebase, have recently received a lot of attention. These graph data sources contain a vast number of inter-related facts, and they are used to facilitate the discovery of interesting knowledge. In this thesis, we address three challenging problems of mining HINs, i.e., (i) relevance search, (ii) entity embedding, and (iii) query recommendation with HINs. First, relevance search on large-scale HINs is studied. We propose a model named meta structure, which is essentially an extension of meta path, to capture the relationship among two entities in a HIN. For example, a researcher may want to find out two authors that have published papers in the same venue, and have also mentioned the same topic. Then he can specify his query using our meta structure to efficiently retrieve such entity pairs in a large HIN. We also propose a data structure named i-LTable to boost the efficiency of query evaluation. Our experiments on real HINs show that meta structure is more effective than meta path in various tasks, such as classification, clustering and kNN search, etc. Next, we study entity embedding on HINs. Basically, our goal is to represent each entity of a HIN as a vector, such that the proximity in the original HIN is preserved. Specifically, we propose an objective function, which aims at minimizing the distance between two probability distributions, one modeling the meta path-based proximities, the other modeling the proximities in the embedded vector space. We also investigate the use of negative sampling to accelerate the optimization process. As shown in our extensive experimental evaluation, our method creates embeddings of high quality and has superior performance in several data mining tasks compared to state-of-the-art network embedding methods. Finally, we study how to use a knowledge base modeled as an HIN, in order to improve the quality of query recommendation for search engines. Specifically, we examine two information sources: (1) a knowledge base HIN, such as YAGO and Freebase; and (2) a query log from a search engine. We study how to use these sources to find new entities useful for query recommendation. We further study a hybrid framework that integrates different query recommendation methods effectively. As shown in the experiments, our proposed approaches provide better recommendations than existing solutions for long-tail queries. In addition, our implemented system needs less than 100ms to generate query recommendations. Thus, our solution is suitable for providing online query recommendation services for search engines.
DegreeDoctor of Philosophy
SubjectComputer networks
Data mining
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/278410

 

DC FieldValueLanguage
dc.contributor.advisorKao, CM-
dc.contributor.advisorCheng, CK-
dc.contributor.advisorMamoulis, N-
dc.contributor.authorHuang, Zhipeng-
dc.contributor.author黄智鹏-
dc.date.accessioned2019-10-09T01:17:36Z-
dc.date.available2019-10-09T01:17:36Z-
dc.date.issued2019-
dc.identifier.citationHuang, Z. [黄智鹏]. (2019). Mining heterogeneous information networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/278410-
dc.description.abstractHeterogeneous information networks (HINs), such as DBLP, YAGO, DBpedia and Freebase, have recently received a lot of attention. These graph data sources contain a vast number of inter-related facts, and they are used to facilitate the discovery of interesting knowledge. In this thesis, we address three challenging problems of mining HINs, i.e., (i) relevance search, (ii) entity embedding, and (iii) query recommendation with HINs. First, relevance search on large-scale HINs is studied. We propose a model named meta structure, which is essentially an extension of meta path, to capture the relationship among two entities in a HIN. For example, a researcher may want to find out two authors that have published papers in the same venue, and have also mentioned the same topic. Then he can specify his query using our meta structure to efficiently retrieve such entity pairs in a large HIN. We also propose a data structure named i-LTable to boost the efficiency of query evaluation. Our experiments on real HINs show that meta structure is more effective than meta path in various tasks, such as classification, clustering and kNN search, etc. Next, we study entity embedding on HINs. Basically, our goal is to represent each entity of a HIN as a vector, such that the proximity in the original HIN is preserved. Specifically, we propose an objective function, which aims at minimizing the distance between two probability distributions, one modeling the meta path-based proximities, the other modeling the proximities in the embedded vector space. We also investigate the use of negative sampling to accelerate the optimization process. As shown in our extensive experimental evaluation, our method creates embeddings of high quality and has superior performance in several data mining tasks compared to state-of-the-art network embedding methods. Finally, we study how to use a knowledge base modeled as an HIN, in order to improve the quality of query recommendation for search engines. Specifically, we examine two information sources: (1) a knowledge base HIN, such as YAGO and Freebase; and (2) a query log from a search engine. We study how to use these sources to find new entities useful for query recommendation. We further study a hybrid framework that integrates different query recommendation methods effectively. As shown in the experiments, our proposed approaches provide better recommendations than existing solutions for long-tail queries. In addition, our implemented system needs less than 100ms to generate query recommendations. Thus, our solution is suitable for providing online query recommendation services for search engines.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshComputer networks-
dc.subject.lcshData mining-
dc.titleMining heterogeneous information networks-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_991044146571903414-
dc.date.hkucongregation2019-
dc.identifier.mmsid991044146571903414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats