MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography

Kim, Sekeun; Jin, Pengfei; Chen, Cheng; Kim, Kyungsang; Lyu, Zhiliang; Ren, Hui; Kim, Sunghwan; Liu, Zhengliang; Zhong, Aoxiao; Liu, Tianming; Li, Xiang; Li, Quanzheng

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/JBHI.2025.3540306
Scopus: eid_2-s2.0-85217966407
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Electrical & Electronic Engineering: Journal/Magazine Articles

Article: MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography

Title	MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography
Authors	Kim, Sekeun Jin, Pengfei Chen, Cheng Kim, Kyungsang Lyu, Zhiliang Ren, Hui Kim, Sunghwan Liu, Zhengliang Zhong, Aoxiao Liu, Tianming Li, Xiang Li, Quanzheng
Keywords	Echocardiography Parameter-efficient fine-tuning Segment Anything Model Segmentation Vision Foundation model
Issue Date	10-Feb-2025
Publisher	IEEE
Citation	IEEE Journal of Biomedical and Health Informatics, 2025 How to Cite? DOI: http://dx.doi.org/10.1109/JBHI.2025.3540306
Abstract	Despite achieving impressive results in general-purpose semantic segmentation with strong generalization on natural images, the Segment Anything Model (SAM) has shown less precision and stability in medical image segmentation. In particular, the SAM architecture is designed for 2D natural images and is therefore not support to handle three-dimensional information, which is particularly important for medical imaging modalities that are often volumetric or video data. In this paper, we introduce MediViSTA, a parameter-efficient fine-tuning method designed to adapt the vision foundation model for medical video, with a specific focus on echocardiography segmentation. To achieve spatial adaptation, we propose a frequency feature fusion technique that injects spatial frequency information from a CNN branch. For temporal adaptation, we integrate temporal adapters within the transformer blocks of the image encoder. Using a fine-tuning strategy, only a small subset of pre-trained parameters is updated, allowing efficient adaptation to echocardiography data. The effectiveness of our method has been comprehensively evaluated on three datasets, comprising two public datasets and one multi-center in-house dataset. Our method consistently outperforms various state-of-the-art approaches without using any prompts. Furthermore, our model exhibits strong generalization capabilities on unseen datasets, surpassing the second-best approach by 2.15% in Dice and 0.09 in temporal consistency. The results demonstrate the potential of MediViSTA to significantly advance echocardiography video segmentation, offering improved accuracy and robustness in cardiac assessment applications.
Persistent Identifier	http://hdl.handle.net/10722/360819
ISSN	2168-2194 2023 Impact Factor: 6.7 2023 SCImago Journal Rankings: 1.964

DC Field	Value	Language
dc.contributor.author	Kim, Sekeun	-
dc.contributor.author	Jin, Pengfei	-
dc.contributor.author	Chen, Cheng	-
dc.contributor.author	Kim, Kyungsang	-
dc.contributor.author	Lyu, Zhiliang	-
dc.contributor.author	Ren, Hui	-
dc.contributor.author	Kim, Sunghwan	-
dc.contributor.author	Liu, Zhengliang	-
dc.contributor.author	Zhong, Aoxiao	-
dc.contributor.author	Liu, Tianming	-
dc.contributor.author	Li, Xiang	-
dc.contributor.author	Li, Quanzheng	-
dc.date.accessioned	2025-09-16T00:30:42Z	-
dc.date.available	2025-09-16T00:30:42Z	-
dc.date.issued	2025-02-10	-
dc.identifier.citation	IEEE Journal of Biomedical and Health Informatics, 2025	-
dc.identifier.issn	2168-2194	-
dc.identifier.uri	http://hdl.handle.net/10722/360819	-
dc.description.abstract	<p>Despite achieving impressive results in general-purpose semantic segmentation with strong generalization on natural images, the Segment Anything Model (SAM) has shown less precision and stability in medical image segmentation. In particular, the SAM architecture is designed for 2D natural images and is therefore not support to handle three-dimensional information, which is particularly important for medical imaging modalities that are often volumetric or video data. In this paper, we introduce MediViSTA, a parameter-efficient fine-tuning method designed to adapt the vision foundation model for medical video, with a specific focus on echocardiography segmentation. To achieve spatial adaptation, we propose a frequency feature fusion technique that injects spatial frequency information from a CNN branch. For temporal adaptation, we integrate temporal adapters within the transformer blocks of the image encoder. Using a fine-tuning strategy, only a small subset of pre-trained parameters is updated, allowing efficient adaptation to echocardiography data. The effectiveness of our method has been comprehensively evaluated on three datasets, comprising two public datasets and one multi-center in-house dataset. Our method consistently outperforms various state-of-the-art approaches without using any prompts. Furthermore, our model exhibits strong generalization capabilities on unseen datasets, surpassing the second-best approach by 2.15% in Dice and 0.09 in temporal consistency. The results demonstrate the potential of MediViSTA to significantly advance echocardiography video segmentation, offering improved accuracy and robustness in cardiac assessment applications.</p>	-
dc.language	eng	-
dc.publisher	IEEE	-
dc.relation.ispartof	IEEE Journal of Biomedical and Health Informatics	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	Echocardiography	-
dc.subject	Parameter-efficient fine-tuning	-
dc.subject	Segment Anything Model	-
dc.subject	Segmentation	-
dc.subject	Vision Foundation model	-
dc.title	MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography	-
dc.type	Article	-
dc.identifier.doi	10.1109/JBHI.2025.3540306	-
dc.identifier.scopus	eid_2-s2.0-85217966407	-
dc.identifier.eissn	2168-2208	-
dc.identifier.issnl	2168-2194	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats