Risk-aware Q-learning for Markov decision processes

There are no files associated with this item.

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Industrial & Manufacturing Systems Engineering: Conference papers

Title	Risk-aware Q-learning for Markov decision processes
Authors	Huang, Wenjie Haskell, William B.
Issue Date	2018
Citation	2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, Australia, 12-15 December 2017. In Conference Proceedings, 2018, p. 4928-4933 How to Cite? DOI: http://dx.doi.org/10.1109/CDC.2017.8264388
Abstract	We are interested in developing reinforcement learning algorithm to tackle risk-aware sequential decision making problems. The model we investigate is a discounted infinite-horizon Markov decision processes with finite state and action spaces. Our algorithm is based on estimating a general minimax function with stochastic approximation, and we show that several risk measures fall within this form. We derive finite-time bounds for this algorithm by combining stochastic approximation with the theories of risk-aware dynamic programming. Finally, we present extensions to several variations of risk measures.
Persistent Identifier	http://hdl.handle.net/10722/308924
ISI Accession Number ID	WOS:000424696904120

DC Field	Value	Language
dc.contributor.author	Huang, Wenjie	-
dc.contributor.author	Haskell, William B.	-
dc.date.accessioned	2021-12-08T07:50:25Z	-
dc.date.available	2021-12-08T07:50:25Z	-
dc.date.issued	2018	-
dc.identifier.citation	2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, Australia, 12-15 December 2017. In Conference Proceedings, 2018, p. 4928-4933	-
dc.identifier.uri	http://hdl.handle.net/10722/308924	-
dc.description.abstract	We are interested in developing reinforcement learning algorithm to tackle risk-aware sequential decision making problems. The model we investigate is a discounted infinite-horizon Markov decision processes with finite state and action spaces. Our algorithm is based on estimating a general minimax function with stochastic approximation, and we show that several risk measures fall within this form. We derive finite-time bounds for this algorithm by combining stochastic approximation with the theories of risk-aware dynamic programming. Finally, we present extensions to several variations of risk measures.	-
dc.language	eng	-
dc.relation.ispartof	2017 IEEE 56th Annual Conference on Decision and Control (CDC)	-
dc.title	Risk-aware Q-learning for Markov decision processes	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CDC.2017.8264388	-
dc.identifier.scopus	eid_2-s2.0-85046164297	-
dc.identifier.spage	4928	-
dc.identifier.epage	4933	-
dc.identifier.isi	WOS:000424696904120	-