File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Multi-Compound Transformer for Accurate Biomedical Image Segmentation

TitleMulti-Compound Transformer for Accurate Biomedical Image Segmentation
Authors
Issue Date2021
PublisherSpringer.
Citation
Ji, Y ... et al. Multi-Compound Transformer for Accurate Biomedical Image Segmentation. In de Bruijne, M ... et al. (eds), The 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Virtual Conference, Strasbourg, France, 27 September - 1 October 2021. Proceedings, Part I, p. 326-336. Cham: Springer, 2021 How to Cite?
AbstractThe recent vision transformer (i.e. for image classification) learns non-local attentive interaction of different patch tokens. However, prior arts miss learning the cross-scale dependencies of different pixels, the semantic correspondence of different labels, and the consistency of the feature representations and semantic embeddings, which are critical for biomedical segmentation. In this paper, we tackle the above issues by proposing a unified transformer network, termed Multi-Compound Transformer (MCTrans), which incorporates rich feature learning and semantic structure mining into a unified framework. Specifically, MCTrans embeds the multi-scale convolutional features as a sequence of tokens, and performs intra- and inter-scale self-attention, rather than single-scale attention in previous works. In addition, a learnable proxy embedding is also introduced to model semantic relationship and feature enhancement by using self-attention and cross-attention, respectively. MCTrans can be easily plugged into a UNet-like network, and attains a significant improvement over the state-of-the-art methods in biomedical image segmentation in six standard benchmarks. For example, MCTrans outperforms UNet by 3.64%, 3.71%, 4.34%, 2.8%, 1.88%, 1.57% in Pannuke, CVC-Clinic, CVC-Colon, Etis, Kavirs, ISIC2018 dataset, respectively. Code is available at https://github.com/JiYuanFeng/MCTrans.
DescriptionPoster Session Th-S2: Topics: Image Segmentation + Domain Adaptation - no. 786
Persistent Identifierhttp://hdl.handle.net/10722/301315
ISBN
ISI Accession Number ID
Series/Report no.Lecture Notes in Computer Science ; vol. 12901

 

DC FieldValueLanguage
dc.contributor.authorJi, Y-
dc.contributor.authorZhang, R-
dc.contributor.authorWang, H-
dc.contributor.authorLi, Z-
dc.contributor.authorWu, L-
dc.contributor.authorHu, Z-
dc.contributor.authorZhang, S-
dc.contributor.authorLuo, P-
dc.date.accessioned2021-07-27T08:09:18Z-
dc.date.available2021-07-27T08:09:18Z-
dc.date.issued2021-
dc.identifier.citationJi, Y ... et al. Multi-Compound Transformer for Accurate Biomedical Image Segmentation. In de Bruijne, M ... et al. (eds), The 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Virtual Conference, Strasbourg, France, 27 September - 1 October 2021. Proceedings, Part I, p. 326-336. Cham: Springer, 2021-
dc.identifier.isbn9783030871925-
dc.identifier.urihttp://hdl.handle.net/10722/301315-
dc.descriptionPoster Session Th-S2: Topics: Image Segmentation + Domain Adaptation - no. 786-
dc.description.abstractThe recent vision transformer (i.e. for image classification) learns non-local attentive interaction of different patch tokens. However, prior arts miss learning the cross-scale dependencies of different pixels, the semantic correspondence of different labels, and the consistency of the feature representations and semantic embeddings, which are critical for biomedical segmentation. In this paper, we tackle the above issues by proposing a unified transformer network, termed Multi-Compound Transformer (MCTrans), which incorporates rich feature learning and semantic structure mining into a unified framework. Specifically, MCTrans embeds the multi-scale convolutional features as a sequence of tokens, and performs intra- and inter-scale self-attention, rather than single-scale attention in previous works. In addition, a learnable proxy embedding is also introduced to model semantic relationship and feature enhancement by using self-attention and cross-attention, respectively. MCTrans can be easily plugged into a UNet-like network, and attains a significant improvement over the state-of-the-art methods in biomedical image segmentation in six standard benchmarks. For example, MCTrans outperforms UNet by 3.64%, 3.71%, 4.34%, 2.8%, 1.88%, 1.57% in Pannuke, CVC-Clinic, CVC-Colon, Etis, Kavirs, ISIC2018 dataset, respectively. Code is available at https://github.com/JiYuanFeng/MCTrans.-
dc.languageeng-
dc.publisherSpringer.-
dc.relation.ispartofInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2021-
dc.relation.ispartofseriesLecture Notes in Computer Science ; vol. 12901-
dc.titleMulti-Compound Transformer for Accurate Biomedical Image Segmentation-
dc.typeConference_Paper-
dc.identifier.emailLuo, P: pluo@hku.hk-
dc.identifier.authorityLuo, P=rp02575-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1007/978-3-030-87193-2_31-
dc.identifier.scopuseid_2-s2.0-85116493310-
dc.identifier.hkuros323754-
dc.identifier.spage326-
dc.identifier.epage336-
dc.identifier.isiWOS:000712019600031-
dc.publisher.placeCham-
dc.identifier.eisbn9783030871932-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats