Extracting k most important groups from data efficiently

Yiu, ML; Mamoulis, N; Hristidis, V

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.datak.2008.04.001
Scopus: eid_2-s2.0-45249100209
WOS: WOS:000258448400005
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Extracting k most important groups from data efficiently

Title	Extracting k most important groups from data efficiently
Authors	Yiu, ML Mamoulis, N Hristidis, V
Keywords	Optimization and performance
Issue Date	2008
Publisher	Elsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/datak
Citation	Data And Knowledge Engineering, 2008, v. 66 n. 2, p. 289-310 How to Cite? DOI: http://dx.doi.org/10.1016/j.datak.2008.04.001
Abstract	We study an important data analysis operator, which extracts the k most important groups from data (i.e., the k groups with the highest aggregate values). In a data warehousing context, an example of the above query is "find the 10 combinations of product-type and month with the largest sum of sales". The problem is challenging as the potential number of groups can be much larger than the memory capacity. We propose on-demand methods for efficient top-k groups processing, under limited memory size. In particular, we design top-k groups retrieval techniques for three representative scenarios as follows. For the scenario with data physically ordered by measure, we propose the write-optimized multi-pass sorted access algorithm (WMSA), that exploits available memory for efficient top-k groups computation. Regarding the scenario with unordered data, we develop the recursive hash algorithm (RHA), which applies hashing with early aggregation, coupled with branch-and-bound techniques and derivation heuristics for tight score bounds of hash partitions. Next, we design the clustered groups algorithm (CGA), which accelerates top-k groups processing for the case where data is clustered by a subset of group-by attributes. Extensive experiments with real and synthetic datasets demonstrate the applicability and efficiency of the proposed algorithms. © 2008 Elsevier B.V. All rights reserved.
Persistent Identifier	http://hdl.handle.net/10722/60621
ISSN	0169-023X 2023 Impact Factor: 2.7 2023 SCImago Journal Rankings: 0.691
ISI Accession Number ID	WOS:000258448400005
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Yiu, ML	en_HK
dc.contributor.author	Mamoulis, N	en_HK
dc.contributor.author	Hristidis, V	en_HK
dc.date.accessioned	2010-05-31T04:15:08Z	-
dc.date.available	2010-05-31T04:15:08Z	-
dc.date.issued	2008	en_HK
dc.identifier.citation	Data And Knowledge Engineering, 2008, v. 66 n. 2, p. 289-310	en_HK
dc.identifier.issn	0169-023X	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/60621	-
dc.description.abstract	We study an important data analysis operator, which extracts the k most important groups from data (i.e., the k groups with the highest aggregate values). In a data warehousing context, an example of the above query is "find the 10 combinations of product-type and month with the largest sum of sales". The problem is challenging as the potential number of groups can be much larger than the memory capacity. We propose on-demand methods for efficient top-k groups processing, under limited memory size. In particular, we design top-k groups retrieval techniques for three representative scenarios as follows. For the scenario with data physically ordered by measure, we propose the write-optimized multi-pass sorted access algorithm (WMSA), that exploits available memory for efficient top-k groups computation. Regarding the scenario with unordered data, we develop the recursive hash algorithm (RHA), which applies hashing with early aggregation, coupled with branch-and-bound techniques and derivation heuristics for tight score bounds of hash partitions. Next, we design the clustered groups algorithm (CGA), which accelerates top-k groups processing for the case where data is clustered by a subset of group-by attributes. Extensive experiments with real and synthetic datasets demonstrate the applicability and efficiency of the proposed algorithms. © 2008 Elsevier B.V. All rights reserved.	en_HK
dc.language	eng	en_HK
dc.publisher	Elsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/datak	en_HK
dc.relation.ispartof	Data and Knowledge Engineering	en_HK
dc.subject	Optimization and performance	en_HK
dc.title	Extracting k most important groups from data efficiently	en_HK
dc.type	Article	en_HK
dc.identifier.email	Mamoulis, N:nikos@cs.hku.hk	en_HK
dc.identifier.authority	Mamoulis, N=rp00155	en_HK
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1016/j.datak.2008.04.001	en_HK
dc.identifier.scopus	eid_2-s2.0-45249100209	en_HK
dc.identifier.hkuros	150167	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-45249100209&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	66	en_HK
dc.identifier.issue	2	en_HK
dc.identifier.spage	289	en_HK
dc.identifier.epage	310	en_HK
dc.identifier.isi	WOS:000258448400005	-
dc.publisher.place	Netherlands	en_HK
dc.identifier.scopusauthorid	Yiu, ML=8589889600	en_HK
dc.identifier.scopusauthorid	Mamoulis, N=6701782749	en_HK
dc.identifier.scopusauthorid	Hristidis, V=6507537461	en_HK
dc.identifier.issnl	0169-023X	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Extracting k most important groups from data efficiently

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats