File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1613/jair.1.18129
- Scopus: eid_2-s2.0-105018737537
- Find via

Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction
| Title | Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction |
|---|---|
| Authors | |
| Keywords | machine learning markov decision processes reinforcement learning |
| Issue Date | 29-Aug-2025 |
| Publisher | AI Access Foundation |
| Citation | Journal of Artificial Intelligence Research, 2025, v. 83 How to Cite? |
| Abstract | We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an O−1/3) convergence rate for both the average optimality gap and constraint violation, which further improves to O−1/2) under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an (Formula Presented) sample complexity for-global optimality. Moreover, by incorporating a diminishing pessimistic term into the constraint, we show that VR-PDPG can attain a zero constraint violation without compromising the convergence rate of the optimality gap. Finally, we validate our methods through numerical experiments. |
| Persistent Identifier | http://hdl.handle.net/10722/368249 |
| ISSN | 2023 Impact Factor: 4.5 2023 SCImago Journal Rankings: 1.614 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Ying, Donghao | - |
| dc.contributor.author | Guo, Mengzi Amy | - |
| dc.contributor.author | Lee, Hyunin | - |
| dc.contributor.author | Ding, Yuhao | - |
| dc.contributor.author | Lavaei, Javad | - |
| dc.contributor.author | Shen, Zuo Jun Max | - |
| dc.date.accessioned | 2025-12-24T00:37:05Z | - |
| dc.date.available | 2025-12-24T00:37:05Z | - |
| dc.date.issued | 2025-08-29 | - |
| dc.identifier.citation | Journal of Artificial Intelligence Research, 2025, v. 83 | - |
| dc.identifier.issn | 1076-9757 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/368249 | - |
| dc.description.abstract | <p>We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an O<sup>−1/3</sup>) convergence rate for both the average optimality gap and constraint violation, which further improves to O<sup>−1/2</sup>) under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an (Formula Presented) sample complexity for-global optimality. Moreover, by incorporating a diminishing pessimistic term into the constraint, we show that VR-PDPG can attain a zero constraint violation without compromising the convergence rate of the optimality gap. Finally, we validate our methods through numerical experiments. <br></p> | - |
| dc.language | eng | - |
| dc.publisher | AI Access Foundation | - |
| dc.relation.ispartof | Journal of Artificial Intelligence Research | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject | machine learning | - |
| dc.subject | markov decision processes | - |
| dc.subject | reinforcement learning | - |
| dc.title | Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1613/jair.1.18129 | - |
| dc.identifier.scopus | eid_2-s2.0-105018737537 | - |
| dc.identifier.volume | 83 | - |
| dc.identifier.eissn | 1943-5037 | - |
| dc.identifier.issnl | 1076-9757 | - |
