Multi-scale Online Learning: Theory and Applications to Online Auctions and Pricing

Bubeck, S; Devanur, N; Huang, Z; Niazadeh, R

File Download

content.pdf

Links for fulltext

(May Require Subscription)

WOS: WOS:000467879600001
Find via

Supplementary

Citations:
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Multi-scale Online Learning: Theory and Applications to Online Auctions and Pricing

Title	Multi-scale Online Learning: Theory and Applications to Online Auctions and Pricing
Authors	Bubeck, S Devanur, N Huang, Z Niazadeh, R
Keywords	online learning multi-scale learning auction theory bandit information sample complexity
Issue Date	2019
Publisher	Journal of Machine Learning Research. The Journal's web site is located at http://mitpress.mit.edu/jmlr
Citation	Journal of Machine Learning Research, 2019, v. 20 n. 62, p. 1-37 How to Cite?
Abstract	We consider revenue maximization in online auction/pricing problems. A seller sells an identical item in each period to a new buyer, or a new set of buyers. For the online pricing problem, both when the arriving buyer bids or only responds to the posted price, we design algorithms whose regret bounds scale with the best fixed price in-hindsight, rather than the range of the values. Under the bidding model, we further show our algorithms achieve a revenue convergence rate that matches the offline sample complexity of the single-item single-buyer auction. We also show regret bounds that are scale free, and match the offline sample complexity, when comparing to a benchmark that requires a lower bound on the market share. We further expand our results beyond pricing to multi-buyer auctions, and obtain online learning algorithms for auctions, with convergence rates matching the known sample complexity upper bound of online single-item multi-buyer auctions. These results are obtained by generalizing the classical learning from experts and multi-armed bandit problems to their multi-scale versions. In this version, the reward of each action is in a different range, and the regret with respect to a given action scales with its own range, rather than the maximum range. We obtain almost optimal multi-scale regret bounds by introducing a new Online Mirror Descent (OMD) algorithm whose mirror map is the multi-scale version of the negative entropy function. We further generalize to the bandit setting by introducing the stochastic variant of this OMD algorithm.
Persistent Identifier	http://hdl.handle.net/10722/273145
ISSN	1532-4435 2023 Impact Factor: 4.3 2023 SCImago Journal Rankings: 2.796
ISI Accession Number ID	WOS:000467879600001

DC Field	Value	Language
dc.contributor.author	Bubeck, S	-
dc.contributor.author	Devanur, N	-
dc.contributor.author	Huang, Z	-
dc.contributor.author	Niazadeh, R	-
dc.date.accessioned	2019-08-06T09:23:22Z	-
dc.date.available	2019-08-06T09:23:22Z	-
dc.date.issued	2019	-
dc.identifier.citation	Journal of Machine Learning Research, 2019, v. 20 n. 62, p. 1-37	-
dc.identifier.issn	1532-4435	-
dc.identifier.uri	http://hdl.handle.net/10722/273145	-
dc.description.abstract	We consider revenue maximization in online auction/pricing problems. A seller sells an identical item in each period to a new buyer, or a new set of buyers. For the online pricing problem, both when the arriving buyer bids or only responds to the posted price, we design algorithms whose regret bounds scale with the best fixed price in-hindsight, rather than the range of the values. Under the bidding model, we further show our algorithms achieve a revenue convergence rate that matches the offline sample complexity of the single-item single-buyer auction. We also show regret bounds that are scale free, and match the offline sample complexity, when comparing to a benchmark that requires a lower bound on the market share. We further expand our results beyond pricing to multi-buyer auctions, and obtain online learning algorithms for auctions, with convergence rates matching the known sample complexity upper bound of online single-item multi-buyer auctions. These results are obtained by generalizing the classical learning from experts and multi-armed bandit problems to their multi-scale versions. In this version, the reward of each action is in a different range, and the regret with respect to a given action scales with its own range, rather than the maximum range. We obtain almost optimal multi-scale regret bounds by introducing a new Online Mirror Descent (OMD) algorithm whose mirror map is the multi-scale version of the negative entropy function. We further generalize to the bandit setting by introducing the stochastic variant of this OMD algorithm.	-
dc.language	eng	-
dc.publisher	Journal of Machine Learning Research. The Journal's web site is located at http://mitpress.mit.edu/jmlr	-
dc.relation.ispartof	Journal of Machine Learning Research	-
dc.subject	online learning	-
dc.subject	multi-scale learning	-
dc.subject	auction theory	-
dc.subject	bandit information	-
dc.subject	sample complexity	-
dc.title	Multi-scale Online Learning: Theory and Applications to Online Auctions and Pricing	-
dc.type	Article	-
dc.identifier.email	Huang, Z: zhiyi@cs.hku.hk	-
dc.identifier.authority	Huang, Z=rp01804	-
dc.description.nature	published_or_final_version	-
dc.identifier.hkuros	299983	-
dc.identifier.volume	20	-
dc.identifier.issue	62	-
dc.identifier.spage	1	-
dc.identifier.epage	37	-
dc.identifier.isi	WOS:000467879600001	-
dc.publisher.place	United States	-
dc.identifier.issnl	1532-4435	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Multi-scale Online Learning: Theory and Applications to Online Auctions and Pricing

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats