File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Improved distributed principal component analysis
Title | Improved distributed principal component analysis |
---|---|
Authors | |
Issue Date | 2014 |
Citation | Advances in Neural Information Processing Systems, 2014, v. 4, n. January, p. 3113-3121 How to Cite? |
Abstract | We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve problems such as k-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for k-means clustering and related problems. Our empirical study on real world data shows a speedup of orders of magnitude, preserving communication with only a negligible degradation in solution quality. Some of these techniques we develop, such as a general transformation from a constant success probability subspace embedding to a high success probability subspace embedding with a dimension and sparsity independent of the success probability, may be of independent interest. |
Persistent Identifier | http://hdl.handle.net/10722/341167 |
ISSN | 2020 SCImago Journal Rankings: 1.399 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Balcan, Maria Florina | - |
dc.contributor.author | Kanchanapally, Vandana | - |
dc.contributor.author | Liang, Yingyu | - |
dc.contributor.author | Woodruff, David | - |
dc.date.accessioned | 2024-03-13T08:40:41Z | - |
dc.date.available | 2024-03-13T08:40:41Z | - |
dc.date.issued | 2014 | - |
dc.identifier.citation | Advances in Neural Information Processing Systems, 2014, v. 4, n. January, p. 3113-3121 | - |
dc.identifier.issn | 1049-5258 | - |
dc.identifier.uri | http://hdl.handle.net/10722/341167 | - |
dc.description.abstract | We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve problems such as k-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for k-means clustering and related problems. Our empirical study on real world data shows a speedup of orders of magnitude, preserving communication with only a negligible degradation in solution quality. Some of these techniques we develop, such as a general transformation from a constant success probability subspace embedding to a high success probability subspace embedding with a dimension and sparsity independent of the success probability, may be of independent interest. | - |
dc.language | eng | - |
dc.relation.ispartof | Advances in Neural Information Processing Systems | - |
dc.title | Improved distributed principal component analysis | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.scopus | eid_2-s2.0-84937837293 | - |
dc.identifier.volume | 4 | - |
dc.identifier.issue | January | - |
dc.identifier.spage | 3113 | - |
dc.identifier.epage | 3121 | - |