Knowledge-based vector space model for text clustering

Jing, Liping; Ng, Michael K.; Huang, Joshua Z.

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s10115-009-0256-5
Scopus: eid_2-s2.0-77957556167
WOS: WOS:000282514300003
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Mathematics: Journal/Magazine Articles

Article: Knowledge-based vector space model for text clustering

Title	Knowledge-based vector space model for text clustering
Authors	Jing, Liping Ng, Michael K.Huang, Joshua Z.
Keywords	Semantic relationship Text clustering Term similarity Knowledge-based VSM
Issue Date	2010
Citation	Knowledge and Information Systems, 2010, v. 25, n. 1, p. 35-55 How to Cite? DOI: http://dx.doi.org/10.1007/s10115-009-0256-5
Abstract	This paper presents a new knowledge-based vector space model (VSM) for text clustering. In the new model, semantic relationships between terms (e.g., words or concepts) are included in representing text documents as a set of vectors. The idea is to calculate the dissimilarity between two documents more effectively so that text clustering results can be enhanced. In this paper, the semantic relationship between two terms is defined by the similarity of the two terms. Such similarity is used to re-weight term frequency in the VSM. We consider and study two different similarity measures for computing the semantic relationship between two terms based on two different approaches. The first approach is based on the existing ontologies like WordNet and MeSH. We define a new similarity measure that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy. The second approach is to make use of text corpora to construct the relationships between terms and then calculate their semantic similarities. Three clustering algorithms, bisecting k-means, feature weighting k-means and a hierarchical clustering algorithm, have been used to cluster real-world text data represented in the new knowledge-based VSM. The experimental results show that the clustering performance based on the new model was much better than that based on the traditional term-based VSM. © 2009 Springer-Verlag London Limited.
Persistent Identifier	http://hdl.handle.net/10722/276871
ISSN	0219-1377 2023 Impact Factor: 2.5 2023 SCImago Journal Rankings: 0.860
ISI Accession Number ID	WOS:000282514300003

DC Field	Value	Language
dc.contributor.author	Jing, Liping	-
dc.contributor.author	Ng, Michael K.	-
dc.contributor.author	Huang, Joshua Z.	-
dc.date.accessioned	2019-09-18T08:34:54Z	-
dc.date.available	2019-09-18T08:34:54Z	-
dc.date.issued	2010	-
dc.identifier.citation	Knowledge and Information Systems, 2010, v. 25, n. 1, p. 35-55	-
dc.identifier.issn	0219-1377	-
dc.identifier.uri	http://hdl.handle.net/10722/276871	-
dc.description.abstract	This paper presents a new knowledge-based vector space model (VSM) for text clustering. In the new model, semantic relationships between terms (e.g., words or concepts) are included in representing text documents as a set of vectors. The idea is to calculate the dissimilarity between two documents more effectively so that text clustering results can be enhanced. In this paper, the semantic relationship between two terms is defined by the similarity of the two terms. Such similarity is used to re-weight term frequency in the VSM. We consider and study two different similarity measures for computing the semantic relationship between two terms based on two different approaches. The first approach is based on the existing ontologies like WordNet and MeSH. We define a new similarity measure that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy. The second approach is to make use of text corpora to construct the relationships between terms and then calculate their semantic similarities. Three clustering algorithms, bisecting k-means, feature weighting k-means and a hierarchical clustering algorithm, have been used to cluster real-world text data represented in the new knowledge-based VSM. The experimental results show that the clustering performance based on the new model was much better than that based on the traditional term-based VSM. © 2009 Springer-Verlag London Limited.	-
dc.language	eng	-
dc.relation.ispartof	Knowledge and Information Systems	-
dc.subject	Semantic relationship	-
dc.subject	Text clustering	-
dc.subject	Term similarity	-
dc.subject	Knowledge-based VSM	-
dc.title	Knowledge-based vector space model for text clustering	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/s10115-009-0256-5	-
dc.identifier.scopus	eid_2-s2.0-77957556167	-
dc.identifier.volume	25	-
dc.identifier.issue	1	-
dc.identifier.spage	35	-
dc.identifier.epage	55	-
dc.identifier.eissn	0219-3116	-
dc.identifier.isi	WOS:000282514300003	-
dc.identifier.issnl	0219-3116	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Knowledge-based vector space model for text clustering

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats