File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Vivim: a Video Vision Mamba for Ultrasound Video Segmentation

TitleVivim: a Video Vision Mamba for Ultrasound Video Segmentation
Authors
KeywordsBreast lesion segmentation
polyp segmentation
State space model
Thyroid segmentation
Ultrasound videos
Issue Date1-Jan-2025
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Circuits and Systems for Video Technology, 2025 How to Cite?
Abstract

Ultrasound video segmentation gains increasing attention in clinical practice due to the redundant dynamic references in video frames. However, traditional convolutional neural networks have a limited receptive field and transformer-based networks are unsatisfactory in constructing long-term dependency from the perspective of computational complexity. This bottleneck poses a significant challenge when processing longer sequences in medical video analysis tasks using available devices with limited memory. Recently, state space models (SSMs), famous by Mamba, have exhibited linear complexity and impressive achievements in efficient long sequence modeling, which have developed deep neural networks by expanding the receptive field on many vision tasks significantly. Unfortunately, vanilla SSMs failed to simultaneously capture causal temporal cues and preserve non-casual spatial information. To this end, this paper presents a Video Vision Mamba-based framework, dubbed as Vivim, for ultrasound video segmentation tasks. Our Vivim can effectively compress the long-term spatiotemporal representation into sequences at varying scales with our designed Temporal Mamba Block. We also introduce an improved boundary-aware affine constraint across frames to enhance the discriminative ability of Vivim on ambiguous lesions. Extensive experiments on thyroid segmentation in ultrasound videos, breast lesion segmentation in ultrasound videos, and polyp segmentation in colonoscopy videos demonstrate the effectiveness and efficiency of our Vivim, superior to existing methods.


Persistent Identifierhttp://hdl.handle.net/10722/362615
ISSN
2023 Impact Factor: 8.3
2023 SCImago Journal Rankings: 2.299

 

DC FieldValueLanguage
dc.contributor.authorYang, Yijun-
dc.contributor.authorXing, Zhaohu-
dc.contributor.authorYu, Lequan-
dc.contributor.authorFu, Huazhu-
dc.contributor.authorHuang, Chunwang-
dc.contributor.authorZhu, Lei-
dc.date.accessioned2025-09-26T00:36:28Z-
dc.date.available2025-09-26T00:36:28Z-
dc.date.issued2025-01-01-
dc.identifier.citationIEEE Transactions on Circuits and Systems for Video Technology, 2025-
dc.identifier.issn1051-8215-
dc.identifier.urihttp://hdl.handle.net/10722/362615-
dc.description.abstract<p>Ultrasound video segmentation gains increasing attention in clinical practice due to the redundant dynamic references in video frames. However, traditional convolutional neural networks have a limited receptive field and transformer-based networks are unsatisfactory in constructing long-term dependency from the perspective of computational complexity. This bottleneck poses a significant challenge when processing longer sequences in medical video analysis tasks using available devices with limited memory. Recently, state space models (SSMs), famous by Mamba, have exhibited linear complexity and impressive achievements in efficient long sequence modeling, which have developed deep neural networks by expanding the receptive field on many vision tasks significantly. Unfortunately, vanilla SSMs failed to simultaneously capture causal temporal cues and preserve non-casual spatial information. To this end, this paper presents a Video Vision Mamba-based framework, dubbed as Vivim, for ultrasound video segmentation tasks. Our Vivim can effectively compress the long-term spatiotemporal representation into sequences at varying scales with our designed Temporal Mamba Block. We also introduce an improved boundary-aware affine constraint across frames to enhance the discriminative ability of Vivim on ambiguous lesions. Extensive experiments on thyroid segmentation in ultrasound videos, breast lesion segmentation in ultrasound videos, and polyp segmentation in colonoscopy videos demonstrate the effectiveness and efficiency of our Vivim, superior to existing methods.</p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Circuits and Systems for Video Technology-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectBreast lesion segmentation-
dc.subjectpolyp segmentation-
dc.subjectState space model-
dc.subjectThyroid segmentation-
dc.subjectUltrasound videos-
dc.titleVivim: a Video Vision Mamba for Ultrasound Video Segmentation-
dc.typeArticle-
dc.identifier.doi10.1109/TCSVT.2025.3563411-
dc.identifier.scopuseid_2-s2.0-105003647979-
dc.identifier.eissn1558-2205-
dc.identifier.issnl1051-8215-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats