Scalable kernel methods via doubly stochastic gradients

Dai, Bo; Xie, Bo; He, Niao; Liang, Yingyu; Raj, Anant; Balcan, Maria Florina; Song, Le

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-84937855981
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: Scalable kernel methods via doubly stochastic gradients

Title	Scalable kernel methods via doubly stochastic gradients
Authors	Dai, Bo Xie, Bo He, Niao Liang, Yingyu Raj, Anant Balcan, Maria Florina Song, Le
Issue Date	2014
Citation	Advances in Neural Information Processing Systems, 2014, v. 4, n. January, p. 3041-3049 How to Cite?
Abstract	The general perception is that kernel methods are not scalable, so neural nets become the choice for large-scale nonlinear learning problems. Have we tried hard enough for kernel methods? In this paper, we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Based on the fact that many kernel methods can be expressed as convex optimization problems, our approach solves the optimization problems by making two unbiased stochastic approximations to the functional gradient - one using random training points and another using random features associated with the kernel - and performing descent steps with this noisy functional gradient. Our algorithm is simple, need no commit to a preset number of random features, and allows the flexibility of the function class to grow as we see more incoming data in the streaming setting. We demonstrate that a function learned by this procedure after t iterations converges to the optimal function in the reproducing kernel Hilbert space in rate O(1/t), and achieves a generalization bound of O(1/√t). Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show competitive performances of our approach as compared to neural nets in datasets such as 2.3 million energy materials from MolecularSpace, 8 million handwritten digits from MNIST, and 1 million photos from ImageNet using convolution features.
Persistent Identifier	http://hdl.handle.net/10722/341172
ISSN	1049-5258 2020 SCImago Journal Rankings: 1.399

DC Field	Value	Language
dc.contributor.author	Dai, Bo	-
dc.contributor.author	Xie, Bo	-
dc.contributor.author	He, Niao	-
dc.contributor.author	Liang, Yingyu	-
dc.contributor.author	Raj, Anant	-
dc.contributor.author	Balcan, Maria Florina	-
dc.contributor.author	Song, Le	-
dc.date.accessioned	2024-03-13T08:40:44Z	-
dc.date.available	2024-03-13T08:40:44Z	-
dc.date.issued	2014	-
dc.identifier.citation	Advances in Neural Information Processing Systems, 2014, v. 4, n. January, p. 3041-3049	-
dc.identifier.issn	1049-5258	-
dc.identifier.uri	http://hdl.handle.net/10722/341172	-
dc.description.abstract	The general perception is that kernel methods are not scalable, so neural nets become the choice for large-scale nonlinear learning problems. Have we tried hard enough for kernel methods? In this paper, we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Based on the fact that many kernel methods can be expressed as convex optimization problems, our approach solves the optimization problems by making two unbiased stochastic approximations to the functional gradient - one using random training points and another using random features associated with the kernel - and performing descent steps with this noisy functional gradient. Our algorithm is simple, need no commit to a preset number of random features, and allows the flexibility of the function class to grow as we see more incoming data in the streaming setting. We demonstrate that a function learned by this procedure after t iterations converges to the optimal function in the reproducing kernel Hilbert space in rate O(1/t), and achieves a generalization bound of O(1/√t). Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show competitive performances of our approach as compared to neural nets in datasets such as 2.3 million energy materials from MolecularSpace, 8 million handwritten digits from MNIST, and 1 million photos from ImageNet using convolution features.	-
dc.language	eng	-
dc.relation.ispartof	Advances in Neural Information Processing Systems	-
dc.title	Scalable kernel methods via doubly stochastic gradients	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.scopus	eid_2-s2.0-84937855981	-
dc.identifier.volume	4	-
dc.identifier.issue	January	-
dc.identifier.spage	3041	-
dc.identifier.epage	3049	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Scalable kernel methods via doubly stochastic gradients

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats