File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/1871437.1871494
- Scopus: eid_2-s2.0-78651291608
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Accelerating probabilistic frequent itemset mining: A model-based approach
Title | Accelerating probabilistic frequent itemset mining: A model-based approach |
---|---|
Authors | |
Keywords | Algorithms |
Issue Date | 2010 |
Publisher | Association for Computing Machinery. |
Citation | The 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), Toronto, Canada, 26-30 October 2010. In Proceedings of the 19th ACM international conference on Information and knowledge management, 2010, p. 429-438 How to Cite? |
Abstract | Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel method to capture the itemset mining process as a Poisson binomial distribution. This model-based approach extracts frequent itemsets with a high degree of accuracy, and supports large databases. We apply our techniques to improve the performance of the algorithms for: (1) finding itemsets whose frequentness probabilities are larger than some threshold; and (2) mining itemsets with the k highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate. Moreover, they are orders of magnitudes faster than previous approaches. © 2010 ACM. |
Persistent Identifier | http://hdl.handle.net/10722/129566 |
ISBN | |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wang, L | en_HK |
dc.contributor.author | Cheng, R | en_HK |
dc.contributor.author | Lee, SD | en_HK |
dc.contributor.author | Cheung, DW | en_HK |
dc.date.accessioned | 2010-12-23T08:39:20Z | - |
dc.date.available | 2010-12-23T08:39:20Z | - |
dc.date.issued | 2010 | en_HK |
dc.identifier.citation | The 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), Toronto, Canada, 26-30 October 2010. In Proceedings of the 19th ACM international conference on Information and knowledge management, 2010, p. 429-438 | en_HK |
dc.identifier.isbn | 978-1-4503-0099-5 | - |
dc.identifier.uri | http://hdl.handle.net/10722/129566 | - |
dc.description.abstract | Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel method to capture the itemset mining process as a Poisson binomial distribution. This model-based approach extracts frequent itemsets with a high degree of accuracy, and supports large databases. We apply our techniques to improve the performance of the algorithms for: (1) finding itemsets whose frequentness probabilities are larger than some threshold; and (2) mining itemsets with the k highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate. Moreover, they are orders of magnitudes faster than previous approaches. © 2010 ACM. | en_HK |
dc.language | eng | en_US |
dc.publisher | Association for Computing Machinery. | - |
dc.relation.ispartof | International Conference on Information and Knowledge Management, Proceedings | en_HK |
dc.subject | Algorithms | en_HK |
dc.title | Accelerating probabilistic frequent itemset mining: A model-based approach | en_HK |
dc.type | Conference_Paper | en_HK |
dc.identifier.email | Cheng, R:ckcheng@cs.hku.hk | en_HK |
dc.identifier.email | Cheung, DW:dcheung@cs.hku.hk | en_HK |
dc.identifier.authority | Cheng, R=rp00074 | en_HK |
dc.identifier.authority | Cheung, DW=rp00101 | en_HK |
dc.description.nature | link_to_OA_fulltext | - |
dc.identifier.doi | 10.1145/1871437.1871494 | en_HK |
dc.identifier.scopus | eid_2-s2.0-78651291608 | en_HK |
dc.identifier.hkuros | 176457 | en_US |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-78651291608&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.spage | 429 | en_HK |
dc.identifier.epage | 438 | en_HK |
dc.publisher.place | United States | - |
dc.description.other | The 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), Toronto, Canada, 26-30 October 2010. In Proceedings of the 19th ACM international conference on Information and knowledge management, 2010, p. 429-438 | - |
dc.identifier.scopusauthorid | Wang, L=36769568800 | en_HK |
dc.identifier.scopusauthorid | Cheng, R=7201955416 | en_HK |
dc.identifier.scopusauthorid | Lee, SD=7601400741 | en_HK |
dc.identifier.scopusauthorid | Cheung, DW=34567902600 | en_HK |