File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Is sampling useful in data mining? A case in the maintenance of discovered association rules

TitleIs sampling useful in data mining? A case in the maintenance of discovered association rules
Authors
KeywordsAssociation rules
Confidence interval
Data mining
Knowledge discovery
Maintenance
Sampling
Update
Issue Date1998
PublisherSpringer New York LLC. The Journal's web site is located at http://springerlink.metapress.com/openurl.asp?genre=journal&issn=1384-5810
Citation
Data Mining And Knowledge Discovery, 1998, v. 2 n. 3, p. 233-262 How to Cite?
AbstractBy nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining discovered association rules. Some studies have been done on the problem of maintaining the discovered association rules when updates are made to the database. All proposed methods must examine not only the changed part but also the unchanged part in the original database, which is very large, and hence take much time. Worse yet, if the updates on the rules are performed frequently on the database but the underlying rule set has not changed much, then the effort could be mostly wasted. In this paper, we devise an algorithm which employs sampling techniques to estimate the difference between the association rules in a database before and after the database is updated. The estimated difference can be used to determine whether we should update the mined association rules or not. If the estimated difference is small, then the rules in the original database is still a good approximation to those in the updated database. Hence, we do not have to spend the resources to update the rules. We can accumulate more updates before actually updating the rules, thereby avoiding the overheads of updating the rules too frequently. Experimental results show that our algorithm is very efficient and highly accurate. © 1998 Kluwer Academic Publishers.
Persistent Identifierhttp://hdl.handle.net/10722/89167
ISSN
2023 Impact Factor: 2.8
2023 SCImago Journal Rankings: 1.813
ISI Accession Number ID
References

 

DC FieldValueLanguage
dc.contributor.authorLee, SDen_HK
dc.contributor.authorCheung, DWen_HK
dc.contributor.authorKao, Ben_HK
dc.date.accessioned2010-09-06T09:53:13Z-
dc.date.available2010-09-06T09:53:13Z-
dc.date.issued1998en_HK
dc.identifier.citationData Mining And Knowledge Discovery, 1998, v. 2 n. 3, p. 233-262en_HK
dc.identifier.issn1384-5810en_HK
dc.identifier.urihttp://hdl.handle.net/10722/89167-
dc.description.abstractBy nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining discovered association rules. Some studies have been done on the problem of maintaining the discovered association rules when updates are made to the database. All proposed methods must examine not only the changed part but also the unchanged part in the original database, which is very large, and hence take much time. Worse yet, if the updates on the rules are performed frequently on the database but the underlying rule set has not changed much, then the effort could be mostly wasted. In this paper, we devise an algorithm which employs sampling techniques to estimate the difference between the association rules in a database before and after the database is updated. The estimated difference can be used to determine whether we should update the mined association rules or not. If the estimated difference is small, then the rules in the original database is still a good approximation to those in the updated database. Hence, we do not have to spend the resources to update the rules. We can accumulate more updates before actually updating the rules, thereby avoiding the overheads of updating the rules too frequently. Experimental results show that our algorithm is very efficient and highly accurate. © 1998 Kluwer Academic Publishers.en_HK
dc.languageengen_HK
dc.publisherSpringer New York LLC. The Journal's web site is located at http://springerlink.metapress.com/openurl.asp?genre=journal&issn=1384-5810en_HK
dc.relation.ispartofData Mining and Knowledge Discoveryen_HK
dc.rightsJournal of Data Mining and Knowledge Discovery. Copyright © Kluwer Academic Publishers.en_HK
dc.subjectAssociation rulesen_HK
dc.subjectConfidence intervalen_HK
dc.subjectData miningen_HK
dc.subjectKnowledge discoveryen_HK
dc.subjectMaintenanceen_HK
dc.subjectSamplingen_HK
dc.subjectUpdateen_HK
dc.titleIs sampling useful in data mining? A case in the maintenance of discovered association rulesen_HK
dc.typeArticleen_HK
dc.identifier.emailCheung, DW:dcheung@cs.hku.hken_HK
dc.identifier.emailKao, B:kao@cs.hku.hken_HK
dc.identifier.authorityCheung, DW=rp00101en_HK
dc.identifier.authorityKao, B=rp00123en_HK
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.scopuseid_2-s2.0-22444451988en_HK
dc.identifier.hkuros40724en_HK
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-22444451988&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume2en_HK
dc.identifier.issue3en_HK
dc.identifier.spage233en_HK
dc.identifier.epage262en_HK
dc.identifier.isiWOS:000077976300002-
dc.publisher.placeUnited Statesen_HK
dc.identifier.scopusauthoridLee, SD=7601400741en_HK
dc.identifier.scopusauthoridCheung, DW=34567902600en_HK
dc.identifier.scopusauthoridKao, B=35221592600en_HK
dc.identifier.issnl1384-5810-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats