Accelerating probabilistic frequent itemset mining: A model-based approach

Wang, L; Cheng, R; Lee, SD; Cheung, DW

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/1871437.1871494
Scopus: eid_2-s2.0-78651291608

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Accelerating probabilistic frequent itemset mining: A model-based approach

Title	Accelerating probabilistic frequent itemset mining: A model-based approach
Authors	Wang, L Cheng, R Lee, SD Cheung, DW
Keywords	Algorithms
Issue Date	2010
Publisher	Association for Computing Machinery.
Citation	The 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), Toronto, Canada, 26-30 October 2010. In Proceedings of the 19th ACM international conference on Information and knowledge management, 2010, p. 429-438 How to Cite? DOI: http://dx.doi.org/10.1145/1871437.1871494
Abstract	Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel method to capture the itemset mining process as a Poisson binomial distribution. This model-based approach extracts frequent itemsets with a high degree of accuracy, and supports large databases. We apply our techniques to improve the performance of the algorithms for: (1) finding itemsets whose frequentness probabilities are larger than some threshold; and (2) mining itemsets with the k highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate. Moreover, they are orders of magnitudes faster than previous approaches. © 2010 ACM.
Persistent Identifier	http://hdl.handle.net/10722/129566
ISBN	978-1-4503-0099-5
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Wang, L	en_HK
dc.contributor.author	Cheng, R	en_HK
dc.contributor.author	Lee, SD	en_HK
dc.contributor.author	Cheung, DW	en_HK
dc.date.accessioned	2010-12-23T08:39:20Z	-
dc.date.available	2010-12-23T08:39:20Z	-
dc.date.issued	2010	en_HK
dc.identifier.citation	The 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), Toronto, Canada, 26-30 October 2010. In Proceedings of the 19th ACM international conference on Information and knowledge management, 2010, p. 429-438	en_HK
dc.identifier.isbn	978-1-4503-0099-5	-
dc.identifier.uri	http://hdl.handle.net/10722/129566	-
dc.description.abstract	Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel method to capture the itemset mining process as a Poisson binomial distribution. This model-based approach extracts frequent itemsets with a high degree of accuracy, and supports large databases. We apply our techniques to improve the performance of the algorithms for: (1) finding itemsets whose frequentness probabilities are larger than some threshold; and (2) mining itemsets with the k highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate. Moreover, they are orders of magnitudes faster than previous approaches. © 2010 ACM.	en_HK
dc.language	eng	en_US
dc.publisher	Association for Computing Machinery.	-
dc.relation.ispartof	International Conference on Information and Knowledge Management, Proceedings	en_HK
dc.subject	Algorithms	en_HK
dc.title	Accelerating probabilistic frequent itemset mining: A model-based approach	en_HK
dc.type	Conference_Paper	en_HK
dc.identifier.email	Cheng, R:ckcheng@cs.hku.hk	en_HK
dc.identifier.email	Cheung, DW:dcheung@cs.hku.hk	en_HK
dc.identifier.authority	Cheng, R=rp00074	en_HK
dc.identifier.authority	Cheung, DW=rp00101	en_HK
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.1145/1871437.1871494	en_HK
dc.identifier.scopus	eid_2-s2.0-78651291608	en_HK
dc.identifier.hkuros	176457	en_US
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-78651291608&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.spage	429	en_HK
dc.identifier.epage	438	en_HK
dc.publisher.place	United States	-
dc.description.other	The 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), Toronto, Canada, 26-30 October 2010. In Proceedings of the 19th ACM international conference on Information and knowledge management, 2010, p. 429-438	-
dc.identifier.scopusauthorid	Wang, L=36769568800	en_HK
dc.identifier.scopusauthorid	Cheng, R=7201955416	en_HK
dc.identifier.scopusauthorid	Lee, SD=7601400741	en_HK
dc.identifier.scopusauthorid	Cheung, DW=34567902600	en_HK

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Accelerating probabilistic frequent itemset mining: A model-based approach

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats