Is sampling useful in data mining? A case in the maintenance of discovered association rules

Lee, SD; Cheung, DW; Kao, B

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-22444451988
WOS: WOS:000077976300002
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Is sampling useful in data mining? A case in the maintenance of discovered association rules

Title	Is sampling useful in data mining? A case in the maintenance of discovered association rules
Authors	Lee, SD Cheung, DW Kao, B
Keywords	Association rules Confidence interval Data mining Knowledge discovery Maintenance Sampling Update
Issue Date	1998
Publisher	Springer New York LLC. The Journal's web site is located at http://springerlink.metapress.com/openurl.asp?genre=journal&issn=1384-5810
Citation	Data Mining And Knowledge Discovery, 1998, v. 2 n. 3, p. 233-262 How to Cite?
Abstract	By nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining discovered association rules. Some studies have been done on the problem of maintaining the discovered association rules when updates are made to the database. All proposed methods must examine not only the changed part but also the unchanged part in the original database, which is very large, and hence take much time. Worse yet, if the updates on the rules are performed frequently on the database but the underlying rule set has not changed much, then the effort could be mostly wasted. In this paper, we devise an algorithm which employs sampling techniques to estimate the difference between the association rules in a database before and after the database is updated. The estimated difference can be used to determine whether we should update the mined association rules or not. If the estimated difference is small, then the rules in the original database is still a good approximation to those in the updated database. Hence, we do not have to spend the resources to update the rules. We can accumulate more updates before actually updating the rules, thereby avoiding the overheads of updating the rules too frequently. Experimental results show that our algorithm is very efficient and highly accurate. © 1998 Kluwer Academic Publishers.
Persistent Identifier	http://hdl.handle.net/10722/89167
ISSN	1384-5810 2023 Impact Factor: 2.8 2023 SCImago Journal Rankings: 1.813
ISI Accession Number ID	WOS:000077976300002
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Lee, SD	en_HK
dc.contributor.author	Cheung, DW	en_HK
dc.contributor.author	Kao, B	en_HK
dc.date.accessioned	2010-09-06T09:53:13Z	-
dc.date.available	2010-09-06T09:53:13Z	-
dc.date.issued	1998	en_HK
dc.identifier.citation	Data Mining And Knowledge Discovery, 1998, v. 2 n. 3, p. 233-262	en_HK
dc.identifier.issn	1384-5810	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/89167	-
dc.description.abstract	By nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining discovered association rules. Some studies have been done on the problem of maintaining the discovered association rules when updates are made to the database. All proposed methods must examine not only the changed part but also the unchanged part in the original database, which is very large, and hence take much time. Worse yet, if the updates on the rules are performed frequently on the database but the underlying rule set has not changed much, then the effort could be mostly wasted. In this paper, we devise an algorithm which employs sampling techniques to estimate the difference between the association rules in a database before and after the database is updated. The estimated difference can be used to determine whether we should update the mined association rules or not. If the estimated difference is small, then the rules in the original database is still a good approximation to those in the updated database. Hence, we do not have to spend the resources to update the rules. We can accumulate more updates before actually updating the rules, thereby avoiding the overheads of updating the rules too frequently. Experimental results show that our algorithm is very efficient and highly accurate. © 1998 Kluwer Academic Publishers.	en_HK
dc.language	eng	en_HK
dc.publisher	Springer New York LLC. The Journal's web site is located at http://springerlink.metapress.com/openurl.asp?genre=journal&issn=1384-5810	en_HK
dc.relation.ispartof	Data Mining and Knowledge Discovery	en_HK
dc.rights	Journal of Data Mining and Knowledge Discovery. Copyright © Kluwer Academic Publishers.	en_HK
dc.subject	Association rules	en_HK
dc.subject	Confidence interval	en_HK
dc.subject	Data mining	en_HK
dc.subject	Knowledge discovery	en_HK
dc.subject	Maintenance	en_HK
dc.subject	Sampling	en_HK
dc.subject	Update	en_HK
dc.title	Is sampling useful in data mining? A case in the maintenance of discovered association rules	en_HK
dc.type	Article	en_HK
dc.identifier.email	Cheung, DW:dcheung@cs.hku.hk	en_HK
dc.identifier.email	Kao, B:kao@cs.hku.hk	en_HK
dc.identifier.authority	Cheung, DW=rp00101	en_HK
dc.identifier.authority	Kao, B=rp00123	en_HK
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.scopus	eid_2-s2.0-22444451988	en_HK
dc.identifier.hkuros	40724	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-22444451988&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	2	en_HK
dc.identifier.issue	3	en_HK
dc.identifier.spage	233	en_HK
dc.identifier.epage	262	en_HK
dc.identifier.isi	WOS:000077976300002	-
dc.publisher.place	United States	en_HK
dc.identifier.scopusauthorid	Lee, SD=7601400741	en_HK
dc.identifier.scopusauthorid	Cheung, DW=34567902600	en_HK
dc.identifier.scopusauthorid	Kao, B=35221592600	en_HK
dc.identifier.issnl	1384-5810	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Is sampling useful in data mining? A case in the maintenance of discovered association rules

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats