Multi-Compound Transformer for Accurate Biomedical Image Segmentation

Ji, Y; Zhang, R; Wang, H; Li, Z; Wu, L; Hu, Z; Zhang, S; Luo, P

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/978-3-030-87193-2_31
Scopus: eid_2-s2.0-85116493310
WOS: WOS:000712019600031

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Multi-Compound Transformer for Accurate Biomedical Image Segmentation

Title	Multi-Compound Transformer for Accurate Biomedical Image Segmentation
Authors	Ji, Y Zhang, R Wang, H Li, Z Wu, L Hu, Z Zhang, S Luo, P
Issue Date	2021
Publisher	Springer.
Citation	Ji, Y ... et al. Multi-Compound Transformer for Accurate Biomedical Image Segmentation. In de Bruijne, M ... et al. (eds), The 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Virtual Conference, Strasbourg, France, 27 September - 1 October 2021. Proceedings, Part I, p. 326-336. Cham: Springer, 2021 How to Cite? DOI: http://dx.doi.org/10.1007/978-3-030-87193-2_31
Abstract	The recent vision transformer (i.e. for image classification) learns non-local attentive interaction of different patch tokens. However, prior arts miss learning the cross-scale dependencies of different pixels, the semantic correspondence of different labels, and the consistency of the feature representations and semantic embeddings, which are critical for biomedical segmentation. In this paper, we tackle the above issues by proposing a unified transformer network, termed Multi-Compound Transformer (MCTrans), which incorporates rich feature learning and semantic structure mining into a unified framework. Specifically, MCTrans embeds the multi-scale convolutional features as a sequence of tokens, and performs intra- and inter-scale self-attention, rather than single-scale attention in previous works. In addition, a learnable proxy embedding is also introduced to model semantic relationship and feature enhancement by using self-attention and cross-attention, respectively. MCTrans can be easily plugged into a UNet-like network, and attains a significant improvement over the state-of-the-art methods in biomedical image segmentation in six standard benchmarks. For example, MCTrans outperforms UNet by 3.64%, 3.71%, 4.34%, 2.8%, 1.88%, 1.57% in Pannuke, CVC-Clinic, CVC-Colon, Etis, Kavirs, ISIC2018 dataset, respectively. Code is available at https://github.com/JiYuanFeng/MCTrans.
Description	Poster Session Th-S2: Topics: Image Segmentation + Domain Adaptation - no. 786
Persistent Identifier	http://hdl.handle.net/10722/301315
ISBN	9783030871925
ISI Accession Number ID	WOS:000712019600031
Series/Report no.	Lecture Notes in Computer Science ; vol. 12901

DC Field	Value	Language
dc.contributor.author	Ji, Y	-
dc.contributor.author	Zhang, R	-
dc.contributor.author	Wang, H	-
dc.contributor.author	Li, Z	-
dc.contributor.author	Wu, L	-
dc.contributor.author	Hu, Z	-
dc.contributor.author	Zhang, S	-
dc.contributor.author	Luo, P	-
dc.date.accessioned	2021-07-27T08:09:18Z	-
dc.date.available	2021-07-27T08:09:18Z	-
dc.date.issued	2021	-
dc.identifier.citation	Ji, Y ... et al. Multi-Compound Transformer for Accurate Biomedical Image Segmentation. In de Bruijne, M ... et al. (eds), The 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Virtual Conference, Strasbourg, France, 27 September - 1 October 2021. Proceedings, Part I, p. 326-336. Cham: Springer, 2021	-
dc.identifier.isbn	9783030871925	-
dc.identifier.uri	http://hdl.handle.net/10722/301315	-
dc.description	Poster Session Th-S2: Topics: Image Segmentation + Domain Adaptation - no. 786	-
dc.description.abstract	The recent vision transformer (i.e. for image classification) learns non-local attentive interaction of different patch tokens. However, prior arts miss learning the cross-scale dependencies of different pixels, the semantic correspondence of different labels, and the consistency of the feature representations and semantic embeddings, which are critical for biomedical segmentation. In this paper, we tackle the above issues by proposing a unified transformer network, termed Multi-Compound Transformer (MCTrans), which incorporates rich feature learning and semantic structure mining into a unified framework. Specifically, MCTrans embeds the multi-scale convolutional features as a sequence of tokens, and performs intra- and inter-scale self-attention, rather than single-scale attention in previous works. In addition, a learnable proxy embedding is also introduced to model semantic relationship and feature enhancement by using self-attention and cross-attention, respectively. MCTrans can be easily plugged into a UNet-like network, and attains a significant improvement over the state-of-the-art methods in biomedical image segmentation in six standard benchmarks. For example, MCTrans outperforms UNet by 3.64%, 3.71%, 4.34%, 2.8%, 1.88%, 1.57% in Pannuke, CVC-Clinic, CVC-Colon, Etis, Kavirs, ISIC2018 dataset, respectively. Code is available at https://github.com/JiYuanFeng/MCTrans.	-
dc.language	eng	-
dc.publisher	Springer.	-
dc.relation.ispartof	International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2021	-
dc.relation.ispartofseries	Lecture Notes in Computer Science ; vol. 12901	-
dc.title	Multi-Compound Transformer for Accurate Biomedical Image Segmentation	-
dc.type	Conference_Paper	-
dc.identifier.email	Luo, P: pluo@hku.hk	-
dc.identifier.authority	Luo, P=rp02575	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/978-3-030-87193-2_31	-
dc.identifier.scopus	eid_2-s2.0-85116493310	-
dc.identifier.hkuros	323754	-
dc.identifier.spage	326	-
dc.identifier.epage	336	-
dc.identifier.isi	WOS:000712019600031	-
dc.publisher.place	Cham	-
dc.identifier.eisbn	9783030871932	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Multi-Compound Transformer for Accurate Biomedical Image Segmentation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats