A comparative study of ontology based term similarity measures on PubMed document clustering

Zhang, Xiaodan; Jing, Liping; Hu, Xiaohua; Ng, Michael; Zhou, Xiaohua

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/978-3-540-71703-4_12
Scopus: eid_2-s2.0-38049162356
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Mathematics: Conference papers

Conference Paper: A comparative study of ontology based term similarity measures on PubMed document clustering

Title	A comparative study of ontology based term similarity measures on PubMed document clustering
Authors	Zhang, Xiaodan Jing, Liping Hu, Xiaohua Ng, Michael Zhou, Xiaohua
Keywords	Domain ontology Semantic similarity measure Document clustering
Issue Date	2007
Publisher	Springer.
Citation	12th International Conference on Database Systems for Advanced Applications (DASFAA 2007), Bangkok, Thailand, 9-12 April 2007. In Advances in Databases: Concepts, Systems and Applications: 12th International Conference on Database Systems for Advanced Applications, DASFAA 2007, Bangkok, Thailand, April 9-12, 2007, Proceedings, 2007, p. 115-126 How to Cite? DOI: http://dx.doi.org/10.1007/978-3-540-71703-4_12
Abstract	Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term similarity measures affect the clustering performance for a certain domain. In this paper, we conduct a comparative study on how different semantic similarity measures of term including path based similarity measure, information content based similarity measure and feature based similarity measure affect document clustering. We evaluate term re-weighting as an important method to integrate domain ontology to clustering process. Meanwhile, we apply k-means clustering on one real-world text dataset, our own corpus generated from PubMed. Experiment results on 8 different semantic measures have shown that: (1) there is no a certain type of similarity measures that significantly outperforms the others; (2) Several similarity measures have rather more stable performance than the others; (3) term re-weighting has positive effects on medical document clustering, but might not be significant when documents are short of terms. © Springer-Verlag Berlin Heidelberg 2007.
Persistent Identifier	http://hdl.handle.net/10722/276824
ISBN	9783540717027
ISSN	0302-9743 2023 SCImago Journal Rankings: 0.606
Series/Report no.	Lecture Notes in Computer Science ; 4443

DC Field	Value	Language
dc.contributor.author	Zhang, Xiaodan	-
dc.contributor.author	Jing, Liping	-
dc.contributor.author	Hu, Xiaohua	-
dc.contributor.author	Ng, Michael	-
dc.contributor.author	Zhou, Xiaohua	-
dc.date.accessioned	2019-09-18T08:34:46Z	-
dc.date.available	2019-09-18T08:34:46Z	-
dc.date.issued	2007	-
dc.identifier.citation	12th International Conference on Database Systems for Advanced Applications (DASFAA 2007), Bangkok, Thailand, 9-12 April 2007. In Advances in Databases: Concepts, Systems and Applications: 12th International Conference on Database Systems for Advanced Applications, DASFAA 2007, Bangkok, Thailand, April 9-12, 2007, Proceedings, 2007, p. 115-126	-
dc.identifier.isbn	9783540717027	-
dc.identifier.issn	0302-9743	-
dc.identifier.uri	http://hdl.handle.net/10722/276824	-
dc.description.abstract	Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term similarity measures affect the clustering performance for a certain domain. In this paper, we conduct a comparative study on how different semantic similarity measures of term including path based similarity measure, information content based similarity measure and feature based similarity measure affect document clustering. We evaluate term re-weighting as an important method to integrate domain ontology to clustering process. Meanwhile, we apply k-means clustering on one real-world text dataset, our own corpus generated from PubMed. Experiment results on 8 different semantic measures have shown that: (1) there is no a certain type of similarity measures that significantly outperforms the others; (2) Several similarity measures have rather more stable performance than the others; (3) term re-weighting has positive effects on medical document clustering, but might not be significant when documents are short of terms. © Springer-Verlag Berlin Heidelberg 2007.	-
dc.language	eng	-
dc.publisher	Springer.	-
dc.relation.ispartof	Advances in Databases: Concepts, Systems and Applications: 12th International Conference on Database Systems for Advanced Applications, DASFAA 2007, Bangkok, Thailand, April 9-12, 2007, Proceedings	-
dc.relation.ispartofseries	Lecture Notes in Computer Science ; 4443	-
dc.subject	Domain ontology	-
dc.subject	Semantic similarity measure	-
dc.subject	Document clustering	-
dc.title	A comparative study of ontology based term similarity measures on PubMed document clustering	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/978-3-540-71703-4_12	-
dc.identifier.scopus	eid_2-s2.0-38049162356	-
dc.identifier.spage	115	-
dc.identifier.epage	126	-
dc.identifier.eissn	1611-3349	-
dc.publisher.place	Berlin	-
dc.identifier.issnl	0302-9743	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: A comparative study of ontology based term similarity measures on PubMed document clustering

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats