An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data

Jing, Liping; Ng, Michael K.; Huang, Joshua Zhexue

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TKDE.2007.1048
Scopus: eid_2-s2.0-34347228671
WOS: WOS:000248223300003
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Mathematics: Journal/Magazine Articles

Article: An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data

Title	An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data
Authors	Jing, Liping Ng, Michael K.Huang, Joshua Zhexue
Keywords	Subspace clustering High-dimensional data K-means clustering Variable weighting Text clustering
Issue Date	2007
Citation	IEEE Transactions on Knowledge and Data Engineering, 2007, v. 19, n. 8, p. 1026-1041 How to Cite? DOI: http://dx.doi.org/10.1109/TKDE.2007.1048
Abstract	This paper presents a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high-dimensional data, dusters of objects often exist in subspaces rather than in the entire space. For example, in text clustering, clusters of documents of different topics are categorized by different subsets of terms or keywords. The keywords for one cluster may not occur in the documents of other clusters. This is a data sparsity problem faced in clustering high-dimensional data. In the new algorithm, we extend the k-means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. The new algorithm is also scalable to large data sets. © 2007 IEEE.
Persistent Identifier	http://hdl.handle.net/10722/276812
ISSN	1041-4347 2023 Impact Factor: 8.9 2023 SCImago Journal Rankings: 2.867
ISI Accession Number ID	WOS:000248223300003

DC Field	Value	Language
dc.contributor.author	Jing, Liping	-
dc.contributor.author	Ng, Michael K.	-
dc.contributor.author	Huang, Joshua Zhexue	-
dc.date.accessioned	2019-09-18T08:34:44Z	-
dc.date.available	2019-09-18T08:34:44Z	-
dc.date.issued	2007	-
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering, 2007, v. 19, n. 8, p. 1026-1041	-
dc.identifier.issn	1041-4347	-
dc.identifier.uri	http://hdl.handle.net/10722/276812	-
dc.description.abstract	This paper presents a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high-dimensional data, dusters of objects often exist in subspaces rather than in the entire space. For example, in text clustering, clusters of documents of different topics are categorized by different subsets of terms or keywords. The keywords for one cluster may not occur in the documents of other clusters. This is a data sparsity problem faced in clustering high-dimensional data. In the new algorithm, we extend the k-means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. The new algorithm is also scalable to large data sets. © 2007 IEEE.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering	-
dc.subject	Subspace clustering	-
dc.subject	High-dimensional data	-
dc.subject	K-means clustering	-
dc.subject	Variable weighting	-
dc.subject	Text clustering	-
dc.title	An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TKDE.2007.1048	-
dc.identifier.scopus	eid_2-s2.0-34347228671	-
dc.identifier.volume	19	-
dc.identifier.issue	8	-
dc.identifier.spage	1026	-
dc.identifier.epage	1041	-
dc.identifier.isi	WOS:000248223300003	-
dc.identifier.issnl	1041-4347	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats