File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Hierarchical synopses with optimal error guarantees

TitleHierarchical synopses with optimal error guarantees
Authors
KeywordsApproximate query processing
Data synopses
Summarization
Issue Date2008
PublisherAssociation for Computing Machinery, Inc
Citation
Acm Transactions On Database Systems, 2008, v. 33 n. 3 How to Cite?
AbstractHierarchical synopsis structures offer a viable alternative in terms of efficiency and flexibility in relation to traditional summarization techniques such as histograms. Previous research on such structures has mostly focused on a single model, based on the Haar wavelet decomposition. In previous work, we have introduced a more refined, wavelet-inspired hierarchical index structure for synopsis construction: the Haar + tree. The chief advantages of this structure are twofold. First, it achieves higher synopsis quality at the task of summarizing data sets with sharp discontinuities than state-of-the-art histogram and Haar wavelet techniques. Second, thanks to its search space delimitation capacity, Haar + synopsis construction operates in time linear in the size of the data set for any monotonic distributive error metric. Contemporaneous research has introduced another hierarchical synopsis structure, the compact hierarchical histogram (CHH). In this article, we elaborate on both these structures. First, we formally prove that the CHH, in its default binary-hierarchy form, is a simplified variant of a Haar + tree. We then focus on the summarization problem, with both these hierarchical synopsis structures, in which an error guarantee expressed by a maximum-error metric is required. We show that this problem is most efficiently solved through its dual, space-minimization counterpart, which can also achieve optimal quality. In this case, there is a benefit to be gained by specializing the algorithm for each structure; hence, our algorithm for optimal-quality maximum-error CHH requires low polynomial time; on the other hand, optimal-quality Haar + synopses for maximum-error metrics are constructed in exponential time; hence, we also develop a low-polynomial-time approximation scheme for the maximum-error Haar + case. Furthermore, we extend our approach for both general-error and maximum-error Haar + synopses to arbitrary dimensionality. In our experimental study, (i) we confirm the theoretically expected superiority of Haar + synopses over Haar wavelet methods in both construction time and achieved quality for representative error metrics; (ii) we demonstrate that Haar + synopses are also constructed faster than optimal plain histograms, and, moreover, achieve higher synopsis quality with highly discontinuous data sets; such an advantage of a hierarchical synopsis structure over a histogram had been intuitively expressed, but never experimentally verified; and (iii) we show that Haar + synopsis quality supersedes that of a CHH. © 2008 ACM.
Persistent Identifierhttp://hdl.handle.net/10722/60625
ISSN
2023 Impact Factor: 2.2
2023 SCImago Journal Rankings: 1.730
ISI Accession Number ID
References

 

DC FieldValueLanguage
dc.contributor.authorKarras, Pen_HK
dc.contributor.authorMamoulis, Nen_HK
dc.date.accessioned2010-05-31T04:15:13Z-
dc.date.available2010-05-31T04:15:13Z-
dc.date.issued2008en_HK
dc.identifier.citationAcm Transactions On Database Systems, 2008, v. 33 n. 3en_HK
dc.identifier.issn0362-5915en_HK
dc.identifier.urihttp://hdl.handle.net/10722/60625-
dc.description.abstractHierarchical synopsis structures offer a viable alternative in terms of efficiency and flexibility in relation to traditional summarization techniques such as histograms. Previous research on such structures has mostly focused on a single model, based on the Haar wavelet decomposition. In previous work, we have introduced a more refined, wavelet-inspired hierarchical index structure for synopsis construction: the Haar + tree. The chief advantages of this structure are twofold. First, it achieves higher synopsis quality at the task of summarizing data sets with sharp discontinuities than state-of-the-art histogram and Haar wavelet techniques. Second, thanks to its search space delimitation capacity, Haar + synopsis construction operates in time linear in the size of the data set for any monotonic distributive error metric. Contemporaneous research has introduced another hierarchical synopsis structure, the compact hierarchical histogram (CHH). In this article, we elaborate on both these structures. First, we formally prove that the CHH, in its default binary-hierarchy form, is a simplified variant of a Haar + tree. We then focus on the summarization problem, with both these hierarchical synopsis structures, in which an error guarantee expressed by a maximum-error metric is required. We show that this problem is most efficiently solved through its dual, space-minimization counterpart, which can also achieve optimal quality. In this case, there is a benefit to be gained by specializing the algorithm for each structure; hence, our algorithm for optimal-quality maximum-error CHH requires low polynomial time; on the other hand, optimal-quality Haar + synopses for maximum-error metrics are constructed in exponential time; hence, we also develop a low-polynomial-time approximation scheme for the maximum-error Haar + case. Furthermore, we extend our approach for both general-error and maximum-error Haar + synopses to arbitrary dimensionality. In our experimental study, (i) we confirm the theoretically expected superiority of Haar + synopses over Haar wavelet methods in both construction time and achieved quality for representative error metrics; (ii) we demonstrate that Haar + synopses are also constructed faster than optimal plain histograms, and, moreover, achieve higher synopsis quality with highly discontinuous data sets; such an advantage of a hierarchical synopsis structure over a histogram had been intuitively expressed, but never experimentally verified; and (iii) we show that Haar + synopsis quality supersedes that of a CHH. © 2008 ACM.en_HK
dc.languageengen_HK
dc.publisherAssociation for Computing Machinery, Incen_HK
dc.relation.ispartofACM Transactions on Database Systemsen_HK
dc.rightsACM Transactions on Database Systems. Copyright © Association for Computing Machinery, Inc.en_HK
dc.subjectApproximate query processingen_HK
dc.subjectData synopsesen_HK
dc.subjectSummarizationen_HK
dc.titleHierarchical synopses with optimal error guaranteesen_HK
dc.typeArticleen_HK
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0362-5915&volume=33:3&spage=52 pages&epage=&date=2008&atitle=Hierarchical+Synopses+with+Optimal+Error+Guaranteesen_HK
dc.identifier.emailMamoulis, N:nikos@cs.hku.hken_HK
dc.identifier.authorityMamoulis, N=rp00155en_HK
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1145/1386118.1386124en_HK
dc.identifier.scopuseid_2-s2.0-51149084988en_HK
dc.identifier.hkuros150168en_HK
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-51149084988&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume33en_HK
dc.identifier.issue3en_HK
dc.identifier.eissn1557-4644-
dc.identifier.isiWOS:000259230300006-
dc.publisher.placeUnited Statesen_HK
dc.identifier.scopusauthoridKarras, P=14028488200en_HK
dc.identifier.scopusauthoridMamoulis, N=6701782749en_HK
dc.identifier.citeulike3171317-
dc.identifier.issnl0362-5915-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats