HARP: A practical projected clustering algorithm

Yip, KY; Cheung, DW; Ng, MK

File Download

103205.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TKDE.2004.74
Scopus: eid_2-s2.0-13844297591
WOS: WOS:000223977300006
Find via

Supplementary

Bookmarks:
- CiteULike: 2
Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Information Technology Services: Journal/Magazine Articles
- Mathematics: Journal/Magazine Articles

Article: HARP: A practical projected clustering algorithm

Title	HARP: A practical projected clustering algorithm
Authors	Yip, KY Cheung, DW Ng, MK
Keywords	Bioinformatics Clustering Data mining Mining methods and algorithms
Issue Date	2004
Publisher	IEEE. The Journal's web site is located at http://www.computer.org/tkde
Citation	IEEE Transactions on Knowledge and Data Engineering, 2004, v. 16 n. 11, p. 1387-1397 How to Cite? DOI: http://dx.doi.org/10.1109/TKDE.2004.74
Abstract	In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to Identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded If incorrect values are used. Unfortunately, in real situations, it is rarely possible for users to supply the parameter values accurately, which causes practical difficulties in applying these algorithms to real data. In this paper, we analyze the major challenges of projected clustering and suggest why these algorithms need to depend heavily on user parameters. Based on the analysis, we propose a new algorithm that exploits the clustering status to adjust the internal thresholds dynamically without the assistance of user parameters. According to the results of extensive experiments on real and synthetic data, the new method has excellent accuracy and usability. It outperformed the other algorithms even when correct parameter values were artificially supplied to them. The encouraging results suggest that projected clustering can be a practical tool for various kinds of real applications.
Persistent Identifier	http://hdl.handle.net/10722/43624
ISSN	1041-4347 2023 Impact Factor: 8.9 2023 SCImago Journal Rankings: 2.867
ISI Accession Number ID	WOS:000223977300006
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Yip, KY	en_HK
dc.contributor.author	Cheung, DW	en_HK
dc.contributor.author	Ng, MK	en_HK
dc.date.accessioned	2007-03-23T04:50:43Z	-
dc.date.available	2007-03-23T04:50:43Z	-
dc.date.issued	2004	en_HK
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering, 2004, v. 16 n. 11, p. 1387-1397	en_HK
dc.identifier.issn	1041-4347	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/43624	-
dc.description.abstract	In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to Identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded If incorrect values are used. Unfortunately, in real situations, it is rarely possible for users to supply the parameter values accurately, which causes practical difficulties in applying these algorithms to real data. In this paper, we analyze the major challenges of projected clustering and suggest why these algorithms need to depend heavily on user parameters. Based on the analysis, we propose a new algorithm that exploits the clustering status to adjust the internal thresholds dynamically without the assistance of user parameters. According to the results of extensive experiments on real and synthetic data, the new method has excellent accuracy and usability. It outperformed the other algorithms even when correct parameter values were artificially supplied to them. The encouraging results suggest that projected clustering can be a practical tool for various kinds of real applications.	en_HK
dc.format.extent	573425 bytes	-
dc.format.extent	26624 bytes	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/msword	-
dc.language	eng	en_HK
dc.publisher	IEEE. The Journal's web site is located at http://www.computer.org/tkde	en_HK
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering	en_HK
dc.rights	©2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.	-
dc.subject	Bioinformatics	en_HK
dc.subject	Clustering	en_HK
dc.subject	Data mining	en_HK
dc.subject	Mining methods and algorithms	en_HK
dc.title	HARP: A practical projected clustering algorithm	en_HK
dc.type	Article	en_HK
dc.identifier.email	Cheung, DW:dcheung@cs.hku.hk	en_HK
dc.identifier.authority	Cheung, DW=rp00101	en_HK
dc.description.nature	published_or_final_version	en_HK
dc.identifier.doi	10.1109/TKDE.2004.74	en_HK
dc.identifier.scopus	eid_2-s2.0-13844297591	en_HK
dc.identifier.hkuros	103205	-
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-13844297591&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	16	en_HK
dc.identifier.issue	11	en_HK
dc.identifier.spage	1387	en_HK
dc.identifier.epage	1397	en_HK
dc.identifier.isi	WOS:000223977300006	-
dc.publisher.place	United States	en_HK
dc.identifier.scopusauthorid	Yip, KY=7101909946	en_HK
dc.identifier.scopusauthorid	Cheung, DW=34567902600	en_HK
dc.identifier.scopusauthorid	Ng, MK=7202076432	en_HK
dc.identifier.citeulike	6337870	-
dc.identifier.issnl	1041-4347	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: HARP: A practical projected clustering algorithm

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats