Managing query quality in probabilistic databases

Li, Xiang; 李想

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_b4775313

Supplementary

Citations:
Appears in Collections:
- Computer Science & Information Systems: Theses
- HKU Theses Online

postgraduate thesis: Managing query quality in probabilistic databases

Title	Managing query quality in probabilistic databases
Authors	Li, Xiang 李想
Advisors	Advisor(s):Cheng, CK Cheung, DWL
Issue Date	2011
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Li, X. [李想]. (2011). Managing query quality in probabilistic databases. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4775313
Abstract	In many emerging applications, such as sensor networks, location-based services, and data integration, the database is inherently uncertain. To handle a large amount of uncertain data, probabilistic databases have been recently proposed, where probabilistic queries are enabled to provide answers with statistical guarantees. In this thesis, we study the important issues of managing the quality of a probabilistic database. We first address the problem of measuring the ambiguity, or quality, of a probabilistic query. This is accomplished by computing the PWS-quality score, a recently proposed measure for quantifying the ambiguity of query answers under the possible world semantics. We study the computation of the PWS-quality for the top-k query. This problem is not trivial, since directly computing the top-k query score is computationally expensive. To tackle this challenge, we propose efficient approximate algorithms for deriving the quality score of a top-k query. We have performed experiments on both synthetic and real data to validate their performance and accuracy. Our second contribution is to study how to use the PWS-quality score to coordinate the process of cleaning uncertain data. Removing ambiguous data from a probabilistic database can often give us a higher-quality query result. However, this operation requires some external knowledge (e.g., an updated value from a sensor source), and is thus not without cost. It is important to choose the correct object to clean, in order to (1) achieve a high quality gain, and (2) incur a low cleaning cost. In this thesis, we examine different cleaning methods for a probabilistic top-k query. We also study an interesting problem where different query users have their own budgets available for cleaning. We demonstrate how an optimal solution, in terms of the lowest cleaning costs, can be achieved, for probabilistic range and maximum queries. An extensive evaluation reveals that these solutions are highly efficient and accurate.
Degree	Master of Philosophy
Subject	Databases. Probabilistic number theory. Query languages (Computer science)
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/174493
HKU Library Item ID	b4775313

DC Field	Value	Language
dc.contributor.advisor	Cheng, CK	-
dc.contributor.advisor	Cheung, DWL	-
dc.contributor.author	Li, Xiang	-
dc.contributor.author	李想	-
dc.date.issued	2011	-
dc.identifier.citation	Li, X. [李想]. (2011). Managing query quality in probabilistic databases. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4775313	-
dc.identifier.uri	http://hdl.handle.net/10722/174493	-
dc.description.abstract	In many emerging applications, such as sensor networks, location-based services, and data integration, the database is inherently uncertain. To handle a large amount of uncertain data, probabilistic databases have been recently proposed, where probabilistic queries are enabled to provide answers with statistical guarantees. In this thesis, we study the important issues of managing the quality of a probabilistic database. We first address the problem of measuring the ambiguity, or quality, of a probabilistic query. This is accomplished by computing the PWS-quality score, a recently proposed measure for quantifying the ambiguity of query answers under the possible world semantics. We study the computation of the PWS-quality for the top-k query. This problem is not trivial, since directly computing the top-k query score is computationally expensive. To tackle this challenge, we propose efficient approximate algorithms for deriving the quality score of a top-k query. We have performed experiments on both synthetic and real data to validate their performance and accuracy. Our second contribution is to study how to use the PWS-quality score to coordinate the process of cleaning uncertain data. Removing ambiguous data from a probabilistic database can often give us a higher-quality query result. However, this operation requires some external knowledge (e.g., an updated value from a sensor source), and is thus not without cost. It is important to choose the correct object to clean, in order to (1) achieve a high quality gain, and (2) incur a low cleaning cost. In this thesis, we examine different cleaning methods for a probabilistic top-k query. We also study an interesting problem where different query users have their own budgets available for cleaning. We demonstrate how an optimal solution, in terms of the lowest cleaning costs, can be achieved, for probabilistic range and maximum queries. An extensive evaluation reveals that these solutions are highly efficient and accurate.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.source.uri	http://hub.hku.hk/bib/B47753134	-
dc.subject.lcsh	Databases.	-
dc.subject.lcsh	Probabilistic number theory.	-
dc.subject.lcsh	Query languages (Computer science)	-
dc.title	Managing query quality in probabilistic databases	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b4775313	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_b4775313	-
dc.date.hkucongregation	2012	-
dc.identifier.mmsid	991033467879703414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Managing query quality in probabilistic databases

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats