File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Embodied AI-Enhanced Vehicular Networks: An Integrated Vision Language Models and Reinforcement Learning Method

TitleEmbodied AI-Enhanced Vehicular Networks: An Integrated Vision Language Models and Reinforcement Learning Method
Authors
KeywordsEmbodied AI
LLAVA
LLM
PPO
QoE
vehicular networks
VLM
Issue Date1-Jan-2025
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Mobile Computing, 2025 How to Cite?
AbstractThis paper investigates adaptive transmission strategies in embodied AI-enhanced vehicular networks by integrating vision language models (VLMs) for semantic information extraction and deep reinforcement learning (DRL) for decision-making. The proposed framework aims to optimize both data transmission efficiency and decision accuracy by formulating an optimization problem that incorporates the Weber-Fechner law, serving as a metric for balancing bandwidth utilization and quality of experience (QoE). Specifically, we employ the large language and vision assistant (LLAVA) model to extract critical semantic information from raw image data captured by embodied AI agents (i.e., vehicles), reducing transmission data size by approximately more than 90% while retaining essential content for vehicular communication and decision-making. In the dynamic vehicular environment, we employ a generalized advantage estimation-based proximal policy optimization (GAE-PPO) method to stabilize decision-making under uncertainty. Simulation results show that attention maps from LLAVA highlight the model's focus on relevant image regions, enhancing semantic representation accuracy. Additionally, our proposed transmission strategy improves QoE by up to 36% compared to DDPG and accelerates convergence by reducing required steps by up to 47% compared to pure PPO. Further analysis indicates that adapting semantic symbol length provides an effective trade-off between transmission quality and bandwidth, achieving up to a 61.4% improvement in QoE when scaling from 4 to 8 vehicles.
Persistent Identifierhttp://hdl.handle.net/10722/362046
ISSN
2023 Impact Factor: 7.7
2023 SCImago Journal Rankings: 2.755

 

DC FieldValueLanguage
dc.contributor.authorZhang, Ruichen-
dc.contributor.authorZhao, Changyuan-
dc.contributor.authorDu, Hongyang-
dc.contributor.authorNiyato, Dusit-
dc.contributor.authorWang, Jiacheng-
dc.contributor.authorSawadsitang, Suttinee-
dc.contributor.authorShen, Xuemin-
dc.contributor.authorKim, Dong In-
dc.date.accessioned2025-09-19T00:31:10Z-
dc.date.available2025-09-19T00:31:10Z-
dc.date.issued2025-01-01-
dc.identifier.citationIEEE Transactions on Mobile Computing, 2025-
dc.identifier.issn1536-1233-
dc.identifier.urihttp://hdl.handle.net/10722/362046-
dc.description.abstractThis paper investigates adaptive transmission strategies in embodied AI-enhanced vehicular networks by integrating vision language models (VLMs) for semantic information extraction and deep reinforcement learning (DRL) for decision-making. The proposed framework aims to optimize both data transmission efficiency and decision accuracy by formulating an optimization problem that incorporates the Weber-Fechner law, serving as a metric for balancing bandwidth utilization and quality of experience (QoE). Specifically, we employ the large language and vision assistant (LLAVA) model to extract critical semantic information from raw image data captured by embodied AI agents (i.e., vehicles), reducing transmission data size by approximately more than 90% while retaining essential content for vehicular communication and decision-making. In the dynamic vehicular environment, we employ a generalized advantage estimation-based proximal policy optimization (GAE-PPO) method to stabilize decision-making under uncertainty. Simulation results show that attention maps from LLAVA highlight the model's focus on relevant image regions, enhancing semantic representation accuracy. Additionally, our proposed transmission strategy improves QoE by up to 36% compared to DDPG and accelerates convergence by reducing required steps by up to 47% compared to pure PPO. Further analysis indicates that adapting semantic symbol length provides an effective trade-off between transmission quality and bandwidth, achieving up to a 61.4% improvement in QoE when scaling from 4 to 8 vehicles.-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Mobile Computing-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectEmbodied AI-
dc.subjectLLAVA-
dc.subjectLLM-
dc.subjectPPO-
dc.subjectQoE-
dc.subjectvehicular networks-
dc.subjectVLM-
dc.titleEmbodied AI-Enhanced Vehicular Networks: An Integrated Vision Language Models and Reinforcement Learning Method-
dc.typeArticle-
dc.identifier.doi10.1109/TMC.2025.3582864-
dc.identifier.scopuseid_2-s2.0-105009710460-
dc.identifier.eissn1558-0660-
dc.identifier.issnl1536-1233-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats