Last-Iterate Convergence in No-Regret Learning: Games with Reference Effects Under Logit Demand

Guo, Amy Mengzi; Ying, Donghao; Lavaei, Javad; Shen, Max Zuo-Jun

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1287/mnsc.2023.03464
Find via

Supplementary

Citations:
Appears in Collections:
- Industrial & Manufacturing Systems Engineering: Journal/Magazine Articles

Article: Last-Iterate Convergence in No-Regret Learning: Games with Reference Effects Under Logit Demand

Title	Last-Iterate Convergence in No-Regret Learning: Games with Reference Effects Under Logit Demand
Authors	Guo, Amy Mengzi Ying, Donghao Lavaei, Javad Shen, Max Zuo-Jun
Issue Date	21-May-2025
Publisher	Institute for Operations Research and Management Sciences
Citation	Management Science, 2025 How to Cite? DOI: http://dx.doi.org/10.1287/mnsc.2023.03464
Abstract	This work examines the behaviors of the online projected gradient ascent (OPGA) algorithm and its variant in a repeated oligopoly price competition under reference effects. In particular, we consider that multiple firms engage in a multiperiod price competition, where consecutive periods are linked by the reference price update and each firm has access only to its own first-order feedback. Consumers assess their willingness to pay by comparing the current price against the memory-based reference price, and their choices follow the multinomial logit (MNL) model. We use the notion of stationary Nash equilibrium (SNE), defined as the fixed point of the equilibrium pricing policy, to simultaneously capture the long-run equilibrium and stability. We first study the loss-neutral reference effects and show that if the firms employ the OPGA algorithm—adjusting the price using the first-order derivatives of their log-revenues—the price and reference price paths attain last-iterate convergence to the unique SNE, thereby guaranteeing the no-regret learning and market stability. Moreover, with appropriate step-sizes, we prove that this algorithm exhibits a convergence rate of ̃𝒪⁢(1/𝑡2) in terms of the squared distance and achieves a constant dynamic regret. Despite the simplicity of the algorithm, its convergence analysis is challenging due to the model lacking typical properties such as strong monotonicity and variational stability that are ordinarily used for the convergence analysis of online games. The inherent asymmetry nature of reference effects motivates the exploration beyond loss-neutrality. When loss-averse reference effects are introduced, we propose a variant of the original algorithm named the conservative-OPGA (C-OPGA) to handle the nonsmooth revenue functions and show that the price and reference price achieve last-iterate convergence to the set of SNEs with the rate of 𝒪⁢(1/√𝑡). Finally, we demonstrate the practicality and robustness of OPGA and C-OPGA by theoretically showing that these algorithms can also adapt to firm-differentiated step-sizes and inexact gradients.
Persistent Identifier	http://hdl.handle.net/10722/368591
ISSN	0025-1909 2023 Impact Factor: 4.6 2023 SCImago Journal Rankings: 5.438

DC Field	Value	Language
dc.contributor.author	Guo, Amy Mengzi	-
dc.contributor.author	Ying, Donghao	-
dc.contributor.author	Lavaei, Javad	-
dc.contributor.author	Shen, Max Zuo-Jun	-
dc.date.accessioned	2026-01-15T00:35:25Z	-
dc.date.available	2026-01-15T00:35:25Z	-
dc.date.issued	2025-05-21	-
dc.identifier.citation	Management Science, 2025	-
dc.identifier.issn	0025-1909	-
dc.identifier.uri	http://hdl.handle.net/10722/368591	-
dc.description.abstract	<p>This work examines the behaviors of the online projected gradient ascent (OPGA) algorithm and its variant in a repeated oligopoly price competition under reference effects. In particular, we consider that multiple firms engage in a multiperiod price competition, where consecutive periods are linked by the reference price update and each firm has access only to its own first-order feedback. Consumers assess their willingness to pay by comparing the current price against the memory-based reference price, and their choices follow the multinomial logit (MNL) model. We use the notion of stationary Nash equilibrium (SNE), defined as the fixed point of the equilibrium pricing policy, to simultaneously capture the long-run equilibrium and stability. We first study the loss-neutral reference effects and show that if the firms employ the OPGA algorithm—adjusting the price using the first-order derivatives of their log-revenues—the price and reference price paths attain last-iterate convergence to the unique SNE, thereby guaranteeing the no-regret learning and market stability. Moreover, with appropriate step-sizes, we prove that this algorithm exhibits a convergence rate of ̃𝒪⁢(1/𝑡2) in terms of the squared distance and achieves a constant dynamic regret. Despite the simplicity of the algorithm, its convergence analysis is challenging due to the model lacking typical properties such as strong monotonicity and variational stability that are ordinarily used for the convergence analysis of online games. The inherent asymmetry nature of reference effects motivates the exploration beyond loss-neutrality. When loss-averse reference effects are introduced, we propose a variant of the original algorithm named the conservative-OPGA (C-OPGA) to handle the nonsmooth revenue functions and show that the price and reference price achieve last-iterate convergence to the set of SNEs with the rate of 𝒪⁢(1/√𝑡). Finally, we demonstrate the practicality and robustness of OPGA and C-OPGA by theoretically showing that these algorithms can also adapt to firm-differentiated step-sizes and inexact gradients.<br></p>	-
dc.language	eng	-
dc.publisher	Institute for Operations Research and Management Sciences	-
dc.relation.ispartof	Management Science	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.title	Last-Iterate Convergence in No-Regret Learning: Games with Reference Effects Under Logit Demand	-
dc.type	Article	-
dc.identifier.doi	10.1287/mnsc.2023.03464	-
dc.identifier.eissn	1526-5501	-
dc.identifier.issnl	0025-1909	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Last-Iterate Convergence in No-Regret Learning: Games with Reference Effects Under Logit Demand

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats