File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/978-3-031-20077-9_1
- Scopus: eid_2-s2.0-85142683816
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers
Title | BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers |
---|---|
Authors | |
Keywords | 3D object detection Autonomous driving Bird’s-Eye-View Map segmentation Transformer |
Issue Date | 2022 |
Citation | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, v. 13669 LNCS, p. 1-18 How to Cite? |
Abstract | 3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. The code is available at https://github.com/zhiqi-li/BEVFormer. |
Persistent Identifier | http://hdl.handle.net/10722/351455 |
ISSN | 2023 SCImago Journal Rankings: 0.606 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Li, Zhiqi | - |
dc.contributor.author | Wang, Wenhai | - |
dc.contributor.author | Li, Hongyang | - |
dc.contributor.author | Xie, Enze | - |
dc.contributor.author | Sima, Chonghao | - |
dc.contributor.author | Lu, Tong | - |
dc.contributor.author | Qiao, Yu | - |
dc.contributor.author | Dai, Jifeng | - |
dc.date.accessioned | 2024-11-20T03:56:23Z | - |
dc.date.available | 2024-11-20T03:56:23Z | - |
dc.date.issued | 2022 | - |
dc.identifier.citation | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, v. 13669 LNCS, p. 1-18 | - |
dc.identifier.issn | 0302-9743 | - |
dc.identifier.uri | http://hdl.handle.net/10722/351455 | - |
dc.description.abstract | 3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. The code is available at https://github.com/zhiqi-li/BEVFormer. | - |
dc.language | eng | - |
dc.relation.ispartof | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | - |
dc.subject | 3D object detection | - |
dc.subject | Autonomous driving | - |
dc.subject | Bird’s-Eye-View | - |
dc.subject | Map segmentation | - |
dc.subject | Transformer | - |
dc.title | BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1007/978-3-031-20077-9_1 | - |
dc.identifier.scopus | eid_2-s2.0-85142683816 | - |
dc.identifier.volume | 13669 LNCS | - |
dc.identifier.spage | 1 | - |
dc.identifier.epage | 18 | - |
dc.identifier.eissn | 1611-3349 | - |