File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TKDE.2004.74
- Scopus: eid_2-s2.0-13844297591
- WOS: WOS:000223977300006
- Find via
Supplementary
-
Bookmarks:
- CiteULike: 2
- Citations:
- Appears in Collections:
Article: HARP: A practical projected clustering algorithm
Title | HARP: A practical projected clustering algorithm |
---|---|
Authors | |
Keywords | Bioinformatics Clustering Data mining Mining methods and algorithms |
Issue Date | 2004 |
Publisher | IEEE. The Journal's web site is located at http://www.computer.org/tkde |
Citation | IEEE Transactions on Knowledge and Data Engineering, 2004, v. 16 n. 11, p. 1387-1397 How to Cite? |
Abstract | In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to Identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded If incorrect values are used. Unfortunately, in real situations, it is rarely possible for users to supply the parameter values accurately, which causes practical difficulties in applying these algorithms to real data. In this paper, we analyze the major challenges of projected clustering and suggest why these algorithms need to depend heavily on user parameters. Based on the analysis, we propose a new algorithm that exploits the clustering status to adjust the internal thresholds dynamically without the assistance of user parameters. According to the results of extensive experiments on real and synthetic data, the new method has excellent accuracy and usability. It outperformed the other algorithms even when correct parameter values were artificially supplied to them. The encouraging results suggest that projected clustering can be a practical tool for various kinds of real applications. |
Persistent Identifier | http://hdl.handle.net/10722/43624 |
ISSN | 2023 Impact Factor: 8.9 2023 SCImago Journal Rankings: 2.867 |
ISI Accession Number ID | |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yip, KY | en_HK |
dc.contributor.author | Cheung, DW | en_HK |
dc.contributor.author | Ng, MK | en_HK |
dc.date.accessioned | 2007-03-23T04:50:43Z | - |
dc.date.available | 2007-03-23T04:50:43Z | - |
dc.date.issued | 2004 | en_HK |
dc.identifier.citation | IEEE Transactions on Knowledge and Data Engineering, 2004, v. 16 n. 11, p. 1387-1397 | en_HK |
dc.identifier.issn | 1041-4347 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/43624 | - |
dc.description.abstract | In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to Identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded If incorrect values are used. Unfortunately, in real situations, it is rarely possible for users to supply the parameter values accurately, which causes practical difficulties in applying these algorithms to real data. In this paper, we analyze the major challenges of projected clustering and suggest why these algorithms need to depend heavily on user parameters. Based on the analysis, we propose a new algorithm that exploits the clustering status to adjust the internal thresholds dynamically without the assistance of user parameters. According to the results of extensive experiments on real and synthetic data, the new method has excellent accuracy and usability. It outperformed the other algorithms even when correct parameter values were artificially supplied to them. The encouraging results suggest that projected clustering can be a practical tool for various kinds of real applications. | en_HK |
dc.format.extent | 573425 bytes | - |
dc.format.extent | 26624 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/msword | - |
dc.language | eng | en_HK |
dc.publisher | IEEE. The Journal's web site is located at http://www.computer.org/tkde | en_HK |
dc.relation.ispartof | IEEE Transactions on Knowledge and Data Engineering | en_HK |
dc.rights | ©2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. | - |
dc.subject | Bioinformatics | en_HK |
dc.subject | Clustering | en_HK |
dc.subject | Data mining | en_HK |
dc.subject | Mining methods and algorithms | en_HK |
dc.title | HARP: A practical projected clustering algorithm | en_HK |
dc.type | Article | en_HK |
dc.identifier.email | Cheung, DW:dcheung@cs.hku.hk | en_HK |
dc.identifier.authority | Cheung, DW=rp00101 | en_HK |
dc.description.nature | published_or_final_version | en_HK |
dc.identifier.doi | 10.1109/TKDE.2004.74 | en_HK |
dc.identifier.scopus | eid_2-s2.0-13844297591 | en_HK |
dc.identifier.hkuros | 103205 | - |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-13844297591&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.volume | 16 | en_HK |
dc.identifier.issue | 11 | en_HK |
dc.identifier.spage | 1387 | en_HK |
dc.identifier.epage | 1397 | en_HK |
dc.identifier.isi | WOS:000223977300006 | - |
dc.publisher.place | United States | en_HK |
dc.identifier.scopusauthorid | Yip, KY=7101909946 | en_HK |
dc.identifier.scopusauthorid | Cheung, DW=34567902600 | en_HK |
dc.identifier.scopusauthorid | Ng, MK=7202076432 | en_HK |
dc.identifier.citeulike | 6337870 | - |
dc.identifier.issnl | 1041-4347 | - |