Camel: Content-aware and meta-path augmented metric learning for author identification

Zhang, Chuxu; Huang, Chao; Yu, Lu; Zhang, Xiangliang; Chawla, Nitesh V.

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3178876.3186152
Scopus: eid_2-s2.0-85055699209
WOS: WOS:000460379000070

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Camel: Content-aware and meta-path augmented metric learning for author identification

Title	Camel: Content-aware and meta-path augmented metric learning for author identification
Authors	Zhang, Chuxu Huang, Chao Yu, Lu Zhang, Xiangliang Chawla, Nitesh V.
Keywords	Author identification Deep learning Heterogeneous networks Metric learning Representation learning
Issue Date	2018
Citation	The Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018, 2018, p. 709-718 How to Cite? DOI: http://dx.doi.org/10.1145/3178876.3186152
Abstract	In this paper, we study the problem of author identification in big scholarly data, which is to effectively rank potential authors for each anonymous paper by using historical data. Most of the existing de-anonymization approaches predict relevance score of paper-author pair via feature engineering, which is not only time and storage consuming, but also introduces irrelevant and redundant features or miss important attributes. Representation learning can automate the feature generation process by learning node embeddings in academic network to infer the correlation of paper-author pair. However, the learned embeddings are often for general purpose (independent of the specific task), or based on network structure only (without considering the node content). To address these issues and make a further progress in solving the author identification problem, we propose Camel, a content-aware and meta-path augmented metric learning model. Specifically, first, the directly correlated paper-author pairs are modeled based on distance metric learning by introducing a push loss function. Next, the paper content embedding encoded by the gated recurrent neural network is integrated into the distance loss. Moreover, the historical bibliographic data of papers is utilized to construct an academic heterogeneous network, wherein a meta-path guided walk integrative learning module based on the task-dependent and content-aware Skipgram model is designed to formulate the correlations between each paper and its indirect author neighbors, and further augments the model. Extensive experiments demonstrate that Camel outperforms the state-of-the-art baselines. It achieves an average improvement of 6.3% over the best baseline method.
Persistent Identifier	http://hdl.handle.net/10722/308769
ISI Accession Number ID	WOS:000460379000070

DC Field	Value	Language
dc.contributor.author	Zhang, Chuxu	-
dc.contributor.author	Huang, Chao	-
dc.contributor.author	Yu, Lu	-
dc.contributor.author	Zhang, Xiangliang	-
dc.contributor.author	Chawla, Nitesh V.	-
dc.date.accessioned	2021-12-08T07:50:05Z	-
dc.date.available	2021-12-08T07:50:05Z	-
dc.date.issued	2018	-
dc.identifier.citation	The Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018, 2018, p. 709-718	-
dc.identifier.uri	http://hdl.handle.net/10722/308769	-
dc.description.abstract	In this paper, we study the problem of author identification in big scholarly data, which is to effectively rank potential authors for each anonymous paper by using historical data. Most of the existing de-anonymization approaches predict relevance score of paper-author pair via feature engineering, which is not only time and storage consuming, but also introduces irrelevant and redundant features or miss important attributes. Representation learning can automate the feature generation process by learning node embeddings in academic network to infer the correlation of paper-author pair. However, the learned embeddings are often for general purpose (independent of the specific task), or based on network structure only (without considering the node content). To address these issues and make a further progress in solving the author identification problem, we propose Camel, a content-aware and meta-path augmented metric learning model. Specifically, first, the directly correlated paper-author pairs are modeled based on distance metric learning by introducing a push loss function. Next, the paper content embedding encoded by the gated recurrent neural network is integrated into the distance loss. Moreover, the historical bibliographic data of papers is utilized to construct an academic heterogeneous network, wherein a meta-path guided walk integrative learning module based on the task-dependent and content-aware Skipgram model is designed to formulate the correlations between each paper and its indirect author neighbors, and further augments the model. Extensive experiments demonstrate that Camel outperforms the state-of-the-art baselines. It achieves an average improvement of 6.3% over the best baseline method.	-
dc.language	eng	-
dc.relation.ispartof	The Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018	-
dc.subject	Author identification	-
dc.subject	Deep learning	-
dc.subject	Heterogeneous networks	-
dc.subject	Metric learning	-
dc.subject	Representation learning	-
dc.title	Camel: Content-aware and meta-path augmented metric learning for author identification	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.1145/3178876.3186152	-
dc.identifier.scopus	eid_2-s2.0-85055699209	-
dc.identifier.spage	709	-
dc.identifier.epage	718	-
dc.identifier.isi	WOS:000460379000070	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Camel: Content-aware and meta-path augmented metric learning for author identification

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats