File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/2939672.2939796
- Scopus: eid_2-s2.0-84985032082
- WOS: WOS:000485529800086
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Communication efficient distributed kernel principal component analysis
Title | Communication efficient distributed kernel principal component analysis |
---|---|
Authors | |
Keywords | Distributed Kernel method Principal component analysis |
Issue Date | 2016 |
Citation | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, v. 13-17-August-2016, p. 725-734 How to Cite? |
Abstract | Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate all of this data to a single data center and then perform kernel PCA. Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality? In this paper, we give an affirmative answer to the question by developing a communication efficient algorithm to perform kernel PCA in the distributed setting. The algorithm is a clever combination of subspace embedding and adaptive sampling techniques, and we show that the algorithm can take as input an arbitrary configuration of distributed datasets, and compute a set of global kernel principal components with relative error guarantees independent of the dimension of the feature space or the total number of data points. In particular, computing k principal components with relative error ϵ over s workers has communication cost O(sρk=ϵ + sk2=ϵ3) words, where ρ is the average number of nonzero entries in each data point. Furthermore, we experimented the algorithm with large-scale real world datasets. The experimental results showed that the algorithm produces a high quality kernel PCA solution while using significantly less communication than alternative approaches. computing. |
Persistent Identifier | http://hdl.handle.net/10722/341189 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Balcan, Maria Florina | - |
dc.contributor.author | Liang, Yingyu | - |
dc.contributor.author | Song, Le | - |
dc.contributor.author | Woodruff, David | - |
dc.contributor.author | Xie, Bo | - |
dc.date.accessioned | 2024-03-13T08:40:52Z | - |
dc.date.available | 2024-03-13T08:40:52Z | - |
dc.date.issued | 2016 | - |
dc.identifier.citation | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, v. 13-17-August-2016, p. 725-734 | - |
dc.identifier.uri | http://hdl.handle.net/10722/341189 | - |
dc.description.abstract | Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate all of this data to a single data center and then perform kernel PCA. Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality? In this paper, we give an affirmative answer to the question by developing a communication efficient algorithm to perform kernel PCA in the distributed setting. The algorithm is a clever combination of subspace embedding and adaptive sampling techniques, and we show that the algorithm can take as input an arbitrary configuration of distributed datasets, and compute a set of global kernel principal components with relative error guarantees independent of the dimension of the feature space or the total number of data points. In particular, computing k principal components with relative error ϵ over s workers has communication cost O(sρk=ϵ + sk2=ϵ3) words, where ρ is the average number of nonzero entries in each data point. Furthermore, we experimented the algorithm with large-scale real world datasets. The experimental results showed that the algorithm produces a high quality kernel PCA solution while using significantly less communication than alternative approaches. computing. | - |
dc.language | eng | - |
dc.relation.ispartof | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining | - |
dc.subject | Distributed | - |
dc.subject | Kernel method | - |
dc.subject | Principal component analysis | - |
dc.title | Communication efficient distributed kernel principal component analysis | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1145/2939672.2939796 | - |
dc.identifier.scopus | eid_2-s2.0-84985032082 | - |
dc.identifier.volume | 13-17-August-2016 | - |
dc.identifier.spage | 725 | - |
dc.identifier.epage | 734 | - |
dc.identifier.isi | WOS:000485529800086 | - |