File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Communication efficient distributed kernel principal component analysis

TitleCommunication efficient distributed kernel principal component analysis
Authors
KeywordsDistributed
Kernel method
Principal component analysis
Issue Date2016
Citation
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, v. 13-17-August-2016, p. 725-734 How to Cite?
AbstractKernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate all of this data to a single data center and then perform kernel PCA. Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality? In this paper, we give an affirmative answer to the question by developing a communication efficient algorithm to perform kernel PCA in the distributed setting. The algorithm is a clever combination of subspace embedding and adaptive sampling techniques, and we show that the algorithm can take as input an arbitrary configuration of distributed datasets, and compute a set of global kernel principal components with relative error guarantees independent of the dimension of the feature space or the total number of data points. In particular, computing k principal components with relative error ϵ over s workers has communication cost O(sρk=ϵ + sk2=ϵ3) words, where ρ is the average number of nonzero entries in each data point. Furthermore, we experimented the algorithm with large-scale real world datasets. The experimental results showed that the algorithm produces a high quality kernel PCA solution while using significantly less communication than alternative approaches. computing.
Persistent Identifierhttp://hdl.handle.net/10722/341189
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorBalcan, Maria Florina-
dc.contributor.authorLiang, Yingyu-
dc.contributor.authorSong, Le-
dc.contributor.authorWoodruff, David-
dc.contributor.authorXie, Bo-
dc.date.accessioned2024-03-13T08:40:52Z-
dc.date.available2024-03-13T08:40:52Z-
dc.date.issued2016-
dc.identifier.citationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, v. 13-17-August-2016, p. 725-734-
dc.identifier.urihttp://hdl.handle.net/10722/341189-
dc.description.abstractKernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate all of this data to a single data center and then perform kernel PCA. Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality? In this paper, we give an affirmative answer to the question by developing a communication efficient algorithm to perform kernel PCA in the distributed setting. The algorithm is a clever combination of subspace embedding and adaptive sampling techniques, and we show that the algorithm can take as input an arbitrary configuration of distributed datasets, and compute a set of global kernel principal components with relative error guarantees independent of the dimension of the feature space or the total number of data points. In particular, computing k principal components with relative error ϵ over s workers has communication cost O(sρk=ϵ + sk2=ϵ3) words, where ρ is the average number of nonzero entries in each data point. Furthermore, we experimented the algorithm with large-scale real world datasets. The experimental results showed that the algorithm produces a high quality kernel PCA solution while using significantly less communication than alternative approaches. computing.-
dc.languageeng-
dc.relation.ispartofProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining-
dc.subjectDistributed-
dc.subjectKernel method-
dc.subjectPrincipal component analysis-
dc.titleCommunication efficient distributed kernel principal component analysis-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1145/2939672.2939796-
dc.identifier.scopuseid_2-s2.0-84985032082-
dc.identifier.volume13-17-August-2016-
dc.identifier.spage725-
dc.identifier.epage734-
dc.identifier.isiWOS:000485529800086-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats