Model and reinforcement learning for Markov games with risk preferences

Huang, Wenjie; Hai, Pham Viet; Haskell, William B.

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1609/aaai.v34i02.5574
Scopus: eid_2-s2.0-85099875908
WOS: WOS:000667722802012

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Industrial & Manufacturing Systems Engineering: Conference papers

Conference Paper: Model and reinforcement learning for Markov games with risk preferences

Title	Model and reinforcement learning for Markov games with risk preferences
Authors	Huang, Wenjie Hai, Pham Viet Haskell, William B.
Issue Date	2020
Citation	The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, 7-12 February 2020. In Proceedings of the AAAI Conference on Artificial Intelligence, 2020, v. 34 n. 2, p. 2022-2029 How to Cite? DOI: http://dx.doi.org/10.1609/aaai.v34i02.5574
Abstract	We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic “risk” from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.
Persistent Identifier	http://hdl.handle.net/10722/308899
ISI Accession Number ID	WOS:000667722802012

DC Field	Value	Language
dc.contributor.author	Huang, Wenjie	-
dc.contributor.author	Hai, Pham Viet	-
dc.contributor.author	Haskell, William B.	-
dc.date.accessioned	2021-12-08T07:50:22Z	-
dc.date.available	2021-12-08T07:50:22Z	-
dc.date.issued	2020	-
dc.identifier.citation	The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, 7-12 February 2020. In Proceedings of the AAAI Conference on Artificial Intelligence, 2020, v. 34 n. 2, p. 2022-2029	-
dc.identifier.uri	http://hdl.handle.net/10722/308899	-
dc.description.abstract	We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic “risk” from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the AAAI Conference on Artificial Intelligence	-
dc.title	Model and reinforcement learning for Markov games with risk preferences	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.1609/aaai.v34i02.5574	-
dc.identifier.scopus	eid_2-s2.0-85099875908	-
dc.identifier.volume	34	-
dc.identifier.issue	2	-
dc.identifier.spage	2022	-
dc.identifier.epage	2029	-
dc.identifier.isi	WOS:000667722802012	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Model and reinforcement learning for Markov games with risk preferences

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats