Embodied Understanding of Driving Scenarios

Zhou, Yunsong; Huang, Linyan; Bu, Qingwen; Zeng, Jia; Li, Tianyu; Qiu, Hang; Zhu, Hongzi; Guo, Minyi; Qiao, Yu; Li, Hongyang

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/978-3-031-73033-7_8
Scopus: eid_2-s2.0-85208541186
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: Embodied Understanding of Driving Scenarios

Title	Embodied Understanding of Driving Scenarios
Authors	Zhou, Yunsong Huang, Linyan Bu, Qingwen Zeng, Jia Li, Tianyu Qiu, Hang Zhu, Hongzi Guo, Minyi Qiao, Yu Li, Hongyang
Issue Date	2025
Citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2025, v. 15120 LNCS, p. 129-148 How to Cite? DOI: http://dx.doi.org/10.1007/978-3-031-73033-7_8
Abstract	Embodied scene understanding serves as the cornerstone for autonomous agents to perceive, interpret, and respond to open driving scenarios. Such understanding is typically founded upon Vision-Language Models (VLMs). Nevertheless, existing VLMs are restricted to the 2D domain, devoid of spatial awareness and long-horizon extrapolation proficiencies. We revisit the key aspects of autonomous driving and formulate appropriate rubrics. Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents’ understanding of driving scenes with large spatial and temporal spans. ELM incorporates space-aware pre-training to endow the agent with robust spatial localization capabilities. Besides, the model employs time-aware token selection to accurately inquire about temporal cues. We instantiate ELM on the reformulated multi-faced benchmark, and it surpasses previous state-of-the-art approaches in all aspects. All code, data, and models are accessible at https://github.com/OpenDriveLab/ELM.
Persistent Identifier	http://hdl.handle.net/10722/351503
ISSN	0302-9743 2023 SCImago Journal Rankings: 0.606

DC Field	Value	Language
dc.contributor.author	Zhou, Yunsong	-
dc.contributor.author	Huang, Linyan	-
dc.contributor.author	Bu, Qingwen	-
dc.contributor.author	Zeng, Jia	-
dc.contributor.author	Li, Tianyu	-
dc.contributor.author	Qiu, Hang	-
dc.contributor.author	Zhu, Hongzi	-
dc.contributor.author	Guo, Minyi	-
dc.contributor.author	Qiao, Yu	-
dc.contributor.author	Li, Hongyang	-
dc.date.accessioned	2024-11-20T03:56:47Z	-
dc.date.available	2024-11-20T03:56:47Z	-
dc.date.issued	2025	-
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2025, v. 15120 LNCS, p. 129-148	-
dc.identifier.issn	0302-9743	-
dc.identifier.uri	http://hdl.handle.net/10722/351503	-
dc.description.abstract	Embodied scene understanding serves as the cornerstone for autonomous agents to perceive, interpret, and respond to open driving scenarios. Such understanding is typically founded upon Vision-Language Models (VLMs). Nevertheless, existing VLMs are restricted to the 2D domain, devoid of spatial awareness and long-horizon extrapolation proficiencies. We revisit the key aspects of autonomous driving and formulate appropriate rubrics. Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents’ understanding of driving scenes with large spatial and temporal spans. ELM incorporates space-aware pre-training to endow the agent with robust spatial localization capabilities. Besides, the model employs time-aware token selection to accurately inquire about temporal cues. We instantiate ELM on the reformulated multi-faced benchmark, and it surpasses previous state-of-the-art approaches in all aspects. All code, data, and models are accessible at https://github.com/OpenDriveLab/ELM.	-
dc.language	eng	-
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	-
dc.title	Embodied Understanding of Driving Scenarios	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/978-3-031-73033-7_8	-
dc.identifier.scopus	eid_2-s2.0-85208541186	-
dc.identifier.volume	15120 LNCS	-
dc.identifier.spage	129	-
dc.identifier.epage	148	-
dc.identifier.eissn	1611-3349	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Embodied Understanding of Driving Scenarios

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats