File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Extracting k most important groups from data efficiently

TitleExtracting k most important groups from data efficiently
Authors
KeywordsOptimization and performance
Issue Date2008
PublisherElsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/datak
Citation
Data And Knowledge Engineering, 2008, v. 66 n. 2, p. 289-310 How to Cite?
AbstractWe study an important data analysis operator, which extracts the k most important groups from data (i.e., the k groups with the highest aggregate values). In a data warehousing context, an example of the above query is "find the 10 combinations of product-type and month with the largest sum of sales". The problem is challenging as the potential number of groups can be much larger than the memory capacity. We propose on-demand methods for efficient top-k groups processing, under limited memory size. In particular, we design top-k groups retrieval techniques for three representative scenarios as follows. For the scenario with data physically ordered by measure, we propose the write-optimized multi-pass sorted access algorithm (WMSA), that exploits available memory for efficient top-k groups computation. Regarding the scenario with unordered data, we develop the recursive hash algorithm (RHA), which applies hashing with early aggregation, coupled with branch-and-bound techniques and derivation heuristics for tight score bounds of hash partitions. Next, we design the clustered groups algorithm (CGA), which accelerates top-k groups processing for the case where data is clustered by a subset of group-by attributes. Extensive experiments with real and synthetic datasets demonstrate the applicability and efficiency of the proposed algorithms. © 2008 Elsevier B.V. All rights reserved.
Persistent Identifierhttp://hdl.handle.net/10722/60621
ISSN
2021 Impact Factor: 1.500
2020 SCImago Journal Rankings: 0.480
ISI Accession Number ID
References

 

DC FieldValueLanguage
dc.contributor.authorYiu, MLen_HK
dc.contributor.authorMamoulis, Nen_HK
dc.contributor.authorHristidis, Ven_HK
dc.date.accessioned2010-05-31T04:15:08Z-
dc.date.available2010-05-31T04:15:08Z-
dc.date.issued2008en_HK
dc.identifier.citationData And Knowledge Engineering, 2008, v. 66 n. 2, p. 289-310en_HK
dc.identifier.issn0169-023Xen_HK
dc.identifier.urihttp://hdl.handle.net/10722/60621-
dc.description.abstractWe study an important data analysis operator, which extracts the k most important groups from data (i.e., the k groups with the highest aggregate values). In a data warehousing context, an example of the above query is "find the 10 combinations of product-type and month with the largest sum of sales". The problem is challenging as the potential number of groups can be much larger than the memory capacity. We propose on-demand methods for efficient top-k groups processing, under limited memory size. In particular, we design top-k groups retrieval techniques for three representative scenarios as follows. For the scenario with data physically ordered by measure, we propose the write-optimized multi-pass sorted access algorithm (WMSA), that exploits available memory for efficient top-k groups computation. Regarding the scenario with unordered data, we develop the recursive hash algorithm (RHA), which applies hashing with early aggregation, coupled with branch-and-bound techniques and derivation heuristics for tight score bounds of hash partitions. Next, we design the clustered groups algorithm (CGA), which accelerates top-k groups processing for the case where data is clustered by a subset of group-by attributes. Extensive experiments with real and synthetic datasets demonstrate the applicability and efficiency of the proposed algorithms. © 2008 Elsevier B.V. All rights reserved.en_HK
dc.languageengen_HK
dc.publisherElsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/dataken_HK
dc.relation.ispartofData and Knowledge Engineeringen_HK
dc.subjectOptimization and performanceen_HK
dc.titleExtracting k most important groups from data efficientlyen_HK
dc.typeArticleen_HK
dc.identifier.emailMamoulis, N:nikos@cs.hku.hken_HK
dc.identifier.authorityMamoulis, N=rp00155en_HK
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1016/j.datak.2008.04.001en_HK
dc.identifier.scopuseid_2-s2.0-45249100209en_HK
dc.identifier.hkuros150167en_HK
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-45249100209&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume66en_HK
dc.identifier.issue2en_HK
dc.identifier.spage289en_HK
dc.identifier.epage310en_HK
dc.identifier.isiWOS:000258448400005-
dc.publisher.placeNetherlandsen_HK
dc.identifier.scopusauthoridYiu, ML=8589889600en_HK
dc.identifier.scopusauthoridMamoulis, N=6701782749en_HK
dc.identifier.scopusauthoridHristidis, V=6507537461en_HK
dc.identifier.issnl0169-023X-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats