File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/978-3-030-87193-2_31
- Scopus: eid_2-s2.0-85116493310
- WOS: WOS:000712019600031
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Multi-Compound Transformer for Accurate Biomedical Image Segmentation
Title | Multi-Compound Transformer for Accurate Biomedical Image Segmentation |
---|---|
Authors | |
Issue Date | 2021 |
Publisher | Springer. |
Citation | Ji, Y ... et al. Multi-Compound Transformer for Accurate Biomedical Image Segmentation. In de Bruijne, M ... et al. (eds), The 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Virtual Conference, Strasbourg, France, 27 September - 1 October 2021. Proceedings, Part I, p. 326-336. Cham: Springer, 2021 How to Cite? |
Abstract | The recent vision transformer (i.e. for image classification) learns non-local attentive interaction of different patch tokens. However, prior arts miss learning the cross-scale dependencies of different pixels, the semantic correspondence of different labels, and the consistency of the feature representations and semantic embeddings, which are critical for biomedical segmentation. In this paper, we tackle the above issues by proposing a unified transformer network, termed Multi-Compound Transformer (MCTrans), which incorporates rich feature learning and semantic structure mining into a unified framework. Specifically, MCTrans embeds the multi-scale convolutional features as a sequence of tokens, and performs intra- and inter-scale self-attention, rather than single-scale attention in previous works. In addition, a learnable proxy embedding is also introduced to model semantic relationship and feature enhancement by using self-attention and cross-attention, respectively. MCTrans can be easily plugged into a UNet-like network, and attains a significant improvement over the state-of-the-art methods in biomedical image segmentation in six standard benchmarks. For example, MCTrans outperforms UNet by 3.64%, 3.71%, 4.34%, 2.8%, 1.88%, 1.57% in Pannuke, CVC-Clinic, CVC-Colon, Etis, Kavirs, ISIC2018 dataset, respectively. Code is available at https://github.com/JiYuanFeng/MCTrans. |
Description | Poster Session Th-S2: Topics: Image Segmentation + Domain Adaptation - no. 786 |
Persistent Identifier | http://hdl.handle.net/10722/301315 |
ISBN | |
ISI Accession Number ID | |
Series/Report no. | Lecture Notes in Computer Science ; vol. 12901 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ji, Y | - |
dc.contributor.author | Zhang, R | - |
dc.contributor.author | Wang, H | - |
dc.contributor.author | Li, Z | - |
dc.contributor.author | Wu, L | - |
dc.contributor.author | Hu, Z | - |
dc.contributor.author | Zhang, S | - |
dc.contributor.author | Luo, P | - |
dc.date.accessioned | 2021-07-27T08:09:18Z | - |
dc.date.available | 2021-07-27T08:09:18Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Ji, Y ... et al. Multi-Compound Transformer for Accurate Biomedical Image Segmentation. In de Bruijne, M ... et al. (eds), The 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Virtual Conference, Strasbourg, France, 27 September - 1 October 2021. Proceedings, Part I, p. 326-336. Cham: Springer, 2021 | - |
dc.identifier.isbn | 9783030871925 | - |
dc.identifier.uri | http://hdl.handle.net/10722/301315 | - |
dc.description | Poster Session Th-S2: Topics: Image Segmentation + Domain Adaptation - no. 786 | - |
dc.description.abstract | The recent vision transformer (i.e. for image classification) learns non-local attentive interaction of different patch tokens. However, prior arts miss learning the cross-scale dependencies of different pixels, the semantic correspondence of different labels, and the consistency of the feature representations and semantic embeddings, which are critical for biomedical segmentation. In this paper, we tackle the above issues by proposing a unified transformer network, termed Multi-Compound Transformer (MCTrans), which incorporates rich feature learning and semantic structure mining into a unified framework. Specifically, MCTrans embeds the multi-scale convolutional features as a sequence of tokens, and performs intra- and inter-scale self-attention, rather than single-scale attention in previous works. In addition, a learnable proxy embedding is also introduced to model semantic relationship and feature enhancement by using self-attention and cross-attention, respectively. MCTrans can be easily plugged into a UNet-like network, and attains a significant improvement over the state-of-the-art methods in biomedical image segmentation in six standard benchmarks. For example, MCTrans outperforms UNet by 3.64%, 3.71%, 4.34%, 2.8%, 1.88%, 1.57% in Pannuke, CVC-Clinic, CVC-Colon, Etis, Kavirs, ISIC2018 dataset, respectively. Code is available at https://github.com/JiYuanFeng/MCTrans. | - |
dc.language | eng | - |
dc.publisher | Springer. | - |
dc.relation.ispartof | International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2021 | - |
dc.relation.ispartofseries | Lecture Notes in Computer Science ; vol. 12901 | - |
dc.title | Multi-Compound Transformer for Accurate Biomedical Image Segmentation | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Luo, P: pluo@hku.hk | - |
dc.identifier.authority | Luo, P=rp02575 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1007/978-3-030-87193-2_31 | - |
dc.identifier.scopus | eid_2-s2.0-85116493310 | - |
dc.identifier.hkuros | 323754 | - |
dc.identifier.spage | 326 | - |
dc.identifier.epage | 336 | - |
dc.identifier.isi | WOS:000712019600031 | - |
dc.publisher.place | Cham | - |
dc.identifier.eisbn | 9783030871932 | - |