File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TCOMM.2020.2982136
- Scopus: eid_2-s2.0-85086893911
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Learning Automata Based Q-Learning for Content Placement in Cooperative Caching
Title | Learning Automata Based Q-Learning for Content Placement in Cooperative Caching |
---|---|
Authors | |
Keywords | content popularity prediction Learning automata based Q-learning quality of experience (QoE) user mobility prediction wireless cooperative caching |
Issue Date | 2020 |
Citation | IEEE Transactions on Communications, 2020, v. 68, n. 6, p. 3667-3680 How to Cite? |
Abstract | An optimization problem of content placement in cooperative caching is formulated, with the aim of maximizing the sum mean opinion score (MOS) of mobile users. Firstly, as user mobility and content popularity have significant impacts on the user experience, a recurrent neural network (RNN) is invoked for user mobility prediction and content popularity prediction. More particularly, practical data collected from GPS-tracker app on smartphones is tackled to test the accuracy of user mobility prediction. Then, based on the predicted mobile users' positions and content popularity, a learning automata based Q-learning (LAQL) algorithm for cooperative caching is proposed, in which learning automata (LA) is invoked for Q-learning to obtain an optimal action selection in a random and stationary environment. It is proven that the LA based action selection scheme is capable of enabling every state to select the optimal action with arbitrary high probability if Q-learning is able to converge to the optimal Q value eventually. In the LAQL algorithm, a central processor acts as the intelligent agent, which allocate contents to BSs according to the reward or penalty from the feedback of the BSs and users, iteratively. To characterize the performance of the proposed LAQL algorithms, sum MOS of users is applied to define the reward function. Extensive simulation results reveal that: 1) the prediction error of RNNs based algorithm lessen with the increase of iterations and nodes; 2) the proposed LAQL achieves significant performance improvement against traditional Q-learning algorithm; and 3) the cooperative caching scheme is capable of outperforming non-cooperative caching and random caching of 3% and 4%, respectively. |
Persistent Identifier | http://hdl.handle.net/10722/349437 |
ISSN | 2023 Impact Factor: 7.2 2020 SCImago Journal Rankings: 1.468 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yang, Zhong | - |
dc.contributor.author | Liu, Yuanwei | - |
dc.contributor.author | Chen, Yue | - |
dc.contributor.author | Jiao, Lei | - |
dc.date.accessioned | 2024-10-17T06:58:31Z | - |
dc.date.available | 2024-10-17T06:58:31Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | IEEE Transactions on Communications, 2020, v. 68, n. 6, p. 3667-3680 | - |
dc.identifier.issn | 0090-6778 | - |
dc.identifier.uri | http://hdl.handle.net/10722/349437 | - |
dc.description.abstract | An optimization problem of content placement in cooperative caching is formulated, with the aim of maximizing the sum mean opinion score (MOS) of mobile users. Firstly, as user mobility and content popularity have significant impacts on the user experience, a recurrent neural network (RNN) is invoked for user mobility prediction and content popularity prediction. More particularly, practical data collected from GPS-tracker app on smartphones is tackled to test the accuracy of user mobility prediction. Then, based on the predicted mobile users' positions and content popularity, a learning automata based Q-learning (LAQL) algorithm for cooperative caching is proposed, in which learning automata (LA) is invoked for Q-learning to obtain an optimal action selection in a random and stationary environment. It is proven that the LA based action selection scheme is capable of enabling every state to select the optimal action with arbitrary high probability if Q-learning is able to converge to the optimal Q value eventually. In the LAQL algorithm, a central processor acts as the intelligent agent, which allocate contents to BSs according to the reward or penalty from the feedback of the BSs and users, iteratively. To characterize the performance of the proposed LAQL algorithms, sum MOS of users is applied to define the reward function. Extensive simulation results reveal that: 1) the prediction error of RNNs based algorithm lessen with the increase of iterations and nodes; 2) the proposed LAQL achieves significant performance improvement against traditional Q-learning algorithm; and 3) the cooperative caching scheme is capable of outperforming non-cooperative caching and random caching of 3% and 4%, respectively. | - |
dc.language | eng | - |
dc.relation.ispartof | IEEE Transactions on Communications | - |
dc.subject | content popularity prediction | - |
dc.subject | Learning automata based Q-learning | - |
dc.subject | quality of experience (QoE) | - |
dc.subject | user mobility prediction | - |
dc.subject | wireless cooperative caching | - |
dc.title | Learning Automata Based Q-Learning for Content Placement in Cooperative Caching | - |
dc.type | Article | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1109/TCOMM.2020.2982136 | - |
dc.identifier.scopus | eid_2-s2.0-85086893911 | - |
dc.identifier.volume | 68 | - |
dc.identifier.issue | 6 | - |
dc.identifier.spage | 3667 | - |
dc.identifier.epage | 3680 | - |
dc.identifier.eissn | 1558-0857 | - |