File Download
Supplementary

Conference Paper: Parallel Sequence Modeling via Generalized Spatial Propagation Network

TitleParallel Sequence Modeling via Generalized Spatial Propagation Network
Authors
Issue Date1-Jun-2025
Abstract

We present the Generalized Spatial Propagation Network (GSPN), a new attention mechanism optimized for vision tasks that inherently captures 2D spatial structures. Existing attention models, including transformers, linear attention, and state-space models like Mamba, process multi-dimensional data as 1D sequences, compromising spatial coherence and efficiency. GSPN overcomes these limitations by directly operating on spatially coherent image data and forming dense pairwise connections through a line-scan approach. Central to GSPN is the Stability-Context Condition, which ensures stable, context-aware propagation across 2D sequences and reduces the effective sequence length to √N for a square map with N elements, significantly enhancing computational efficiency. With learnable, input-dependent weights and no reliance on positional embeddings, GSPN achieves superior spatial fidelity and state-of-the-art performance in vision tasks, including ImageNet classification, class-guided image generation, and text-to-image generation. Notably, GSPN accelerates SD-XL with softmax-attention by over 84× when generating 16K images


Persistent Identifierhttp://hdl.handle.net/10722/362398

 

DC FieldValueLanguage
dc.contributor.authorWang, Hongjun-
dc.contributor.authorByeon, Wonmin-
dc.contributor.authorXu, Jiarui-
dc.contributor.authorGu, Jinwei-
dc.contributor.authorCheung, Ka, Chun-
dc.contributor.authorWang, Xiaolong-
dc.contributor.authorHan, Kai-
dc.contributor.authorKautz, Jan-
dc.contributor.authorLiu, Sifei-
dc.date.accessioned2025-09-23T00:31:15Z-
dc.date.available2025-09-23T00:31:15Z-
dc.date.issued2025-06-01-
dc.identifier.urihttp://hdl.handle.net/10722/362398-
dc.description.abstract<p>We present the Generalized Spatial Propagation Network (GSPN), a new attention mechanism optimized for vision tasks that inherently captures 2D spatial structures. Existing attention models, including transformers, linear attention, and state-space models like Mamba, process multi-dimensional data as 1D sequences, compromising spatial coherence and efficiency. GSPN overcomes these limitations by directly operating on spatially coherent image data and forming dense pairwise connections through a line-scan approach. Central to GSPN is the Stability-Context Condition, which ensures stable, context-aware propagation across 2D sequences and reduces the effective sequence length to √N for a square map with N elements, significantly enhancing computational efficiency. With learnable, input-dependent weights and no reliance on positional embeddings, GSPN achieves superior spatial fidelity and state-of-the-art performance in vision tasks, including ImageNet classification, class-guided image generation, and text-to-image generation. Notably, GSPN accelerates SD-XL with softmax-attention by over 84× when generating 16K images</p>-
dc.languageeng-
dc.relation.ispartofComputer Vision and Pattern Recognition (CVPR) 2025 (11/06/2025-15/06/2025, Nashville)-
dc.titleParallel Sequence Modeling via Generalized Spatial Propagation Network-
dc.typeConference_Paper-
dc.description.naturepreprint-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats