File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/978-3-031-73033-7_8
- Scopus: eid_2-s2.0-85208541186
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Embodied Understanding of Driving Scenarios
Title | Embodied Understanding of Driving Scenarios |
---|---|
Authors | |
Issue Date | 2025 |
Citation | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2025, v. 15120 LNCS, p. 129-148 How to Cite? |
Abstract | Embodied scene understanding serves as the cornerstone for autonomous agents to perceive, interpret, and respond to open driving scenarios. Such understanding is typically founded upon Vision-Language Models (VLMs). Nevertheless, existing VLMs are restricted to the 2D domain, devoid of spatial awareness and long-horizon extrapolation proficiencies. We revisit the key aspects of autonomous driving and formulate appropriate rubrics. Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents’ understanding of driving scenes with large spatial and temporal spans. ELM incorporates space-aware pre-training to endow the agent with robust spatial localization capabilities. Besides, the model employs time-aware token selection to accurately inquire about temporal cues. We instantiate ELM on the reformulated multi-faced benchmark, and it surpasses previous state-of-the-art approaches in all aspects. All code, data, and models are accessible at https://github.com/OpenDriveLab/ELM. |
Persistent Identifier | http://hdl.handle.net/10722/351503 |
ISSN | 2023 SCImago Journal Rankings: 0.606 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhou, Yunsong | - |
dc.contributor.author | Huang, Linyan | - |
dc.contributor.author | Bu, Qingwen | - |
dc.contributor.author | Zeng, Jia | - |
dc.contributor.author | Li, Tianyu | - |
dc.contributor.author | Qiu, Hang | - |
dc.contributor.author | Zhu, Hongzi | - |
dc.contributor.author | Guo, Minyi | - |
dc.contributor.author | Qiao, Yu | - |
dc.contributor.author | Li, Hongyang | - |
dc.date.accessioned | 2024-11-20T03:56:47Z | - |
dc.date.available | 2024-11-20T03:56:47Z | - |
dc.date.issued | 2025 | - |
dc.identifier.citation | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2025, v. 15120 LNCS, p. 129-148 | - |
dc.identifier.issn | 0302-9743 | - |
dc.identifier.uri | http://hdl.handle.net/10722/351503 | - |
dc.description.abstract | Embodied scene understanding serves as the cornerstone for autonomous agents to perceive, interpret, and respond to open driving scenarios. Such understanding is typically founded upon Vision-Language Models (VLMs). Nevertheless, existing VLMs are restricted to the 2D domain, devoid of spatial awareness and long-horizon extrapolation proficiencies. We revisit the key aspects of autonomous driving and formulate appropriate rubrics. Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents’ understanding of driving scenes with large spatial and temporal spans. ELM incorporates space-aware pre-training to endow the agent with robust spatial localization capabilities. Besides, the model employs time-aware token selection to accurately inquire about temporal cues. We instantiate ELM on the reformulated multi-faced benchmark, and it surpasses previous state-of-the-art approaches in all aspects. All code, data, and models are accessible at https://github.com/OpenDriveLab/ELM. | - |
dc.language | eng | - |
dc.relation.ispartof | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | - |
dc.title | Embodied Understanding of Driving Scenarios | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1007/978-3-031-73033-7_8 | - |
dc.identifier.scopus | eid_2-s2.0-85208541186 | - |
dc.identifier.volume | 15120 LNCS | - |
dc.identifier.spage | 129 | - |
dc.identifier.epage | 148 | - |
dc.identifier.eissn | 1611-3349 | - |