File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/CDC.2017.8264388
- Scopus: eid_2-s2.0-85046164297
- WOS: WOS:000424696904120
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Risk-aware Q-learning for Markov decision processes
Title | Risk-aware Q-learning for Markov decision processes |
---|---|
Authors | |
Issue Date | 2018 |
Citation | 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, Australia, 12-15 December 2017. In Conference Proceedings, 2018, p. 4928-4933 How to Cite? |
Abstract | We are interested in developing reinforcement learning algorithm to tackle risk-aware sequential decision making problems. The model we investigate is a discounted infinite-horizon Markov decision processes with finite state and action spaces. Our algorithm is based on estimating a general minimax function with stochastic approximation, and we show that several risk measures fall within this form. We derive finite-time bounds for this algorithm by combining stochastic approximation with the theories of risk-aware dynamic programming. Finally, we present extensions to several variations of risk measures. |
Persistent Identifier | http://hdl.handle.net/10722/308924 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Huang, Wenjie | - |
dc.contributor.author | Haskell, William B. | - |
dc.date.accessioned | 2021-12-08T07:50:25Z | - |
dc.date.available | 2021-12-08T07:50:25Z | - |
dc.date.issued | 2018 | - |
dc.identifier.citation | 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, Australia, 12-15 December 2017. In Conference Proceedings, 2018, p. 4928-4933 | - |
dc.identifier.uri | http://hdl.handle.net/10722/308924 | - |
dc.description.abstract | We are interested in developing reinforcement learning algorithm to tackle risk-aware sequential decision making problems. The model we investigate is a discounted infinite-horizon Markov decision processes with finite state and action spaces. Our algorithm is based on estimating a general minimax function with stochastic approximation, and we show that several risk measures fall within this form. We derive finite-time bounds for this algorithm by combining stochastic approximation with the theories of risk-aware dynamic programming. Finally, we present extensions to several variations of risk measures. | - |
dc.language | eng | - |
dc.relation.ispartof | 2017 IEEE 56th Annual Conference on Decision and Control (CDC) | - |
dc.title | Risk-aware Q-learning for Markov decision processes | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1109/CDC.2017.8264388 | - |
dc.identifier.scopus | eid_2-s2.0-85046164297 | - |
dc.identifier.spage | 4928 | - |
dc.identifier.epage | 4933 | - |
dc.identifier.isi | WOS:000424696904120 | - |