Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

Jia, Xiaosong; Wu, Penghao; Chen, Li; Xie, Jiangwei; He, Conghui; Yan, Junchi; Li, Hongyang

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR52729.2023.02105
Scopus: eid_2-s2.0-85173717667
Find via

Supplementary

Citations:
- Scopus: 34
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

Title	Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving
Authors	Jia, Xiaosong Wu, Penghao Chen, Li Xie, Jiangwei He, Conghui Yan, Junchi Li, Hongyang
Keywords	Autonomous driving
Issue Date	2023
Citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023, v. 2023-June, p. 21983-21994 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR52729.2023.02105
Abstract	End-to-end autonomous driving has made impressive progress in recent years. Existing methods usually adopt the decoupled encoder-decoder paradigm, where the encoder extracts hidden features from raw sensor data, and the decoder outputs the ego-vehicle's future trajectories or actions. Under such a paradigm, the encoder does not have access to the intended behavior of the ego agent, leaving the burden of finding out safety-critical regions from the massive receptive field and inferring about future situations to the decoder. Even worse, the decoder is usually composed of several simple multi-layer perceptrons (MLP) or GRUs while the encoder is delicately designed (e.g., a combination of heavy ResNets or Transformer). Such an imbalanced resource-task division hampers the learning process. In this work, we aim to alleviate the aforementioned problem by two principles: (1) fully utilizing the capacity of the encoder; (2) increasing the capacity of the decoder. Concretely, we first predict a coarse-grained future position and action based on the encoder features. Then, conditioned on the position and action, the future scene is imagined to check the ramification if we drive accordingly. We also retrieve the encoder features around the predicted coordinate to obtain fine-grained information about the safety-critical region. Finally, based on the predicted future and the retrieved salient feature, we refine the coarse-grained position and action by predicting its offset from ground-truth. The above refinement module could be stacked in a cascaded fashion, which extends the capacity of the decoder with spatial-temporal prior knowledge about the conditioned future. We conduct experiments on the CARLA simulator and achieve state-of-the-art performance in closed-loop benchmarks. Extensive ablation studies demonstrate the effectiveness of each proposed module.
Persistent Identifier	http://hdl.handle.net/10722/351477
ISSN	1063-6919 2023 SCImago Journal Rankings: 10.331

DC Field	Value	Language
dc.contributor.author	Jia, Xiaosong	-
dc.contributor.author	Wu, Penghao	-
dc.contributor.author	Chen, Li	-
dc.contributor.author	Xie, Jiangwei	-
dc.contributor.author	He, Conghui	-
dc.contributor.author	Yan, Junchi	-
dc.contributor.author	Li, Hongyang	-
dc.date.accessioned	2024-11-20T03:56:32Z	-
dc.date.available	2024-11-20T03:56:32Z	-
dc.date.issued	2023	-
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023, v. 2023-June, p. 21983-21994	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	http://hdl.handle.net/10722/351477	-
dc.description.abstract	End-to-end autonomous driving has made impressive progress in recent years. Existing methods usually adopt the decoupled encoder-decoder paradigm, where the encoder extracts hidden features from raw sensor data, and the decoder outputs the ego-vehicle's future trajectories or actions. Under such a paradigm, the encoder does not have access to the intended behavior of the ego agent, leaving the burden of finding out safety-critical regions from the massive receptive field and inferring about future situations to the decoder. Even worse, the decoder is usually composed of several simple multi-layer perceptrons (MLP) or GRUs while the encoder is delicately designed (e.g., a combination of heavy ResNets or Transformer). Such an imbalanced resource-task division hampers the learning process. In this work, we aim to alleviate the aforementioned problem by two principles: (1) fully utilizing the capacity of the encoder; (2) increasing the capacity of the decoder. Concretely, we first predict a coarse-grained future position and action based on the encoder features. Then, conditioned on the position and action, the future scene is imagined to check the ramification if we drive accordingly. We also retrieve the encoder features around the predicted coordinate to obtain fine-grained information about the safety-critical region. Finally, based on the predicted future and the retrieved salient feature, we refine the coarse-grained position and action by predicting its offset from ground-truth. The above refinement module could be stacked in a cascaded fashion, which extends the capacity of the decoder with spatial-temporal prior knowledge about the conditioned future. We conduct experiments on the CARLA simulator and achieve state-of-the-art performance in closed-loop benchmarks. Extensive ablation studies demonstrate the effectiveness of each proposed module.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.subject	Autonomous driving	-
dc.title	Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CVPR52729.2023.02105	-
dc.identifier.scopus	eid_2-s2.0-85173717667	-
dc.identifier.volume	2023-June	-
dc.identifier.spage	21983	-
dc.identifier.epage	21994	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats