File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: UPA: An Automated, Accurate and Efficient Differentially Private Big-data Mining System

TitleUPA: An Automated, Accurate and Efficient Differentially Private Big-data Mining System
Authors
KeywordsSensitivity
Flexible printed circuits
Sparks
Static analysis
Issue Date2020
PublisherIEEE. The Journal's web site is located at https://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000192
Citation
Proceedings of 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Valencia, Spain, 29 June-2 July 2020, p. 515-527 How to Cite?
AbstractIn the era of big-data, individuals and institutions store their sensitive data on clouds, and these data are often analyzed and computed by MapReduce frameworks (e.g., Spark). However, releasing the computation result on these data may leak privacy. Differential Privacy (DP) is a powerful method to preserve the privacy of an individual data record from a computation result. Given an input dataset and a query, DP typically perturbs an output value with noise proportional to sensitivity, the greatest change on an output value when a record is added to or removed from the input dataset. Unfortunately, directly computing the sensitivity value for a query and an input dataset is computationally infeasible, because it requires adding or removing every record from the dataset and repeatedly running the same query on the dataset: a dataset of one million input records requires running the same query for more than one million times. This paper presents UPA, the first automated, accurate, and efficient sensitivity inferring approach for big-data mining applications. Our key observation is that MapReduce operators often have commutative and associative properties in order to enable parallelism and fault tolerance among computers. Therefore, UPA can greatly reduce the repeated computations at runtime while computing a precise sensitivity value automatically for general big-data queries. We compared UPA with FLEX, the most relevant work that does static analysis on queries to infer sensitivity values. Based on an extensive evaluation on nine diverse Spark queries, UPA supports all the nine evaluated queries, while FLEX supports only five of the nine queries. For the five queries which both UPA and FLEX can support, UPA enforces DP with five orders of magnitude more accurate sensitivity values than FLEX. UPA has reasonable performance overhead compared to native Spark. UPA's source code is available on https://github.com/hku-systems/UPA.
DescriptionSession 11 - Trusted Cloud Computing
Persistent Identifierhttp://hdl.handle.net/10722/286407
ISSN

 

DC FieldValueLanguage
dc.contributor.authorLi, TO-
dc.contributor.authorJiang, J-
dc.contributor.authorQi, J-
dc.contributor.authorSo, CC-
dc.contributor.authorMa, JC-
dc.contributor.authorChen, X-
dc.contributor.authorShen, T-
dc.contributor.authorCui, H-
dc.contributor.authorWang, Y-
dc.contributor.authorWang, P-
dc.date.accessioned2020-08-31T07:03:27Z-
dc.date.available2020-08-31T07:03:27Z-
dc.date.issued2020-
dc.identifier.citationProceedings of 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Valencia, Spain, 29 June-2 July 2020, p. 515-527-
dc.identifier.issn1530-0889-
dc.identifier.urihttp://hdl.handle.net/10722/286407-
dc.descriptionSession 11 - Trusted Cloud Computing-
dc.description.abstractIn the era of big-data, individuals and institutions store their sensitive data on clouds, and these data are often analyzed and computed by MapReduce frameworks (e.g., Spark). However, releasing the computation result on these data may leak privacy. Differential Privacy (DP) is a powerful method to preserve the privacy of an individual data record from a computation result. Given an input dataset and a query, DP typically perturbs an output value with noise proportional to sensitivity, the greatest change on an output value when a record is added to or removed from the input dataset. Unfortunately, directly computing the sensitivity value for a query and an input dataset is computationally infeasible, because it requires adding or removing every record from the dataset and repeatedly running the same query on the dataset: a dataset of one million input records requires running the same query for more than one million times. This paper presents UPA, the first automated, accurate, and efficient sensitivity inferring approach for big-data mining applications. Our key observation is that MapReduce operators often have commutative and associative properties in order to enable parallelism and fault tolerance among computers. Therefore, UPA can greatly reduce the repeated computations at runtime while computing a precise sensitivity value automatically for general big-data queries. We compared UPA with FLEX, the most relevant work that does static analysis on queries to infer sensitivity values. Based on an extensive evaluation on nine diverse Spark queries, UPA supports all the nine evaluated queries, while FLEX supports only five of the nine queries. For the five queries which both UPA and FLEX can support, UPA enforces DP with five orders of magnitude more accurate sensitivity values than FLEX. UPA has reasonable performance overhead compared to native Spark. UPA's source code is available on https://github.com/hku-systems/UPA.-
dc.languageeng-
dc.publisherIEEE. The Journal's web site is located at https://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000192-
dc.relation.ispartofInternational Conference on Dependable Systems and Networks (DSN) Proceedings-
dc.rightsInternational Conference on Dependable Systems and Networks (DSN) Proceedings. Copyright © IEEE.-
dc.rights©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.-
dc.subjectSensitivity-
dc.subjectFlexible printed circuits-
dc.subjectSparks-
dc.subjectStatic analysis-
dc.titleUPA: An Automated, Accurate and Efficient Differentially Private Big-data Mining System-
dc.typeConference_Paper-
dc.identifier.emailCui, H: heming@cs.hku.hk-
dc.identifier.emailWang, Y: amywang@hku.hk-
dc.identifier.authorityCui, H=rp02008-
dc.identifier.doi10.1109/DSN48063.2020.00064-
dc.identifier.scopuseid_2-s2.0-85090405505-
dc.identifier.hkuros313508-
dc.identifier.spage515-
dc.identifier.epage527-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats