File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data

TitleAn entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data
Authors
KeywordsSubspace clustering
High-dimensional data
K-means clustering
Variable weighting
Text clustering
Issue Date2007
Citation
IEEE Transactions on Knowledge and Data Engineering, 2007, v. 19, n. 8, p. 1026-1041 How to Cite?
AbstractThis paper presents a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high-dimensional data, dusters of objects often exist in subspaces rather than in the entire space. For example, in text clustering, clusters of documents of different topics are categorized by different subsets of terms or keywords. The keywords for one cluster may not occur in the documents of other clusters. This is a data sparsity problem faced in clustering high-dimensional data. In the new algorithm, we extend the k-means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. The new algorithm is also scalable to large data sets. © 2007 IEEE.
Persistent Identifierhttp://hdl.handle.net/10722/276812
ISSN
2020 Impact Factor: 6.977
2020 SCImago Journal Rankings: 1.360
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorJing, Liping-
dc.contributor.authorNg, Michael K.-
dc.contributor.authorHuang, Joshua Zhexue-
dc.date.accessioned2019-09-18T08:34:44Z-
dc.date.available2019-09-18T08:34:44Z-
dc.date.issued2007-
dc.identifier.citationIEEE Transactions on Knowledge and Data Engineering, 2007, v. 19, n. 8, p. 1026-1041-
dc.identifier.issn1041-4347-
dc.identifier.urihttp://hdl.handle.net/10722/276812-
dc.description.abstractThis paper presents a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high-dimensional data, dusters of objects often exist in subspaces rather than in the entire space. For example, in text clustering, clusters of documents of different topics are categorized by different subsets of terms or keywords. The keywords for one cluster may not occur in the documents of other clusters. This is a data sparsity problem faced in clustering high-dimensional data. In the new algorithm, we extend the k-means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. The new algorithm is also scalable to large data sets. © 2007 IEEE.-
dc.languageeng-
dc.relation.ispartofIEEE Transactions on Knowledge and Data Engineering-
dc.subjectSubspace clustering-
dc.subjectHigh-dimensional data-
dc.subjectK-means clustering-
dc.subjectVariable weighting-
dc.subjectText clustering-
dc.titleAn entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/TKDE.2007.1048-
dc.identifier.scopuseid_2-s2.0-34347228671-
dc.identifier.volume19-
dc.identifier.issue8-
dc.identifier.spage1026-
dc.identifier.epage1041-
dc.identifier.isiWOS:000248223300003-
dc.identifier.issnl1041-4347-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats