Colorectal polyp segmentation via Transformer-based adaptive feature selection

Liang Liming; Kang Ting; Wang Chengbin; Chen Kangquan; Li Yulin

doi:10.12086/oee.2025.240279

Article navigation > Opto-Electronic Engineering > 2025 Vol. 52 > No. 3 > 240279

Next Article Previous Article

Liang L M, Kang T, Wang C B, et al. Colorectal polyp segmentation via Transformer-based adaptive feature selection[J]. Opto-Electron Eng, 2025, 52(3): 240279. doi: 10.12086/oee.2025.240279

Citation:

Liang L M, Kang T, Wang C B, et al. Colorectal polyp segmentation via Transformer-based adaptive feature selection[J]. Opto-Electron Eng, 2025, 52(3): 240279. doi: 10.12086/oee.2025.240279

Colorectal polyp segmentation via Transformer-based adaptive feature selection

College of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China

Fund Project: National Natural Science Foundation of China (51365017, 61463018), the Natural Science Foundation of Jiangxi Province (20192BAB205084), and the Youth Project of Science and Technology Research of the Jiangxi Provincial Department of Education (GJJ2200848)

More Information

^*Corresponding author: 1833075267@qq.com
CSTR: 32245.14.oee.2025.240279

Received Date 29 November 2024

Revised Date 24 January 2025

Accepted Date 06 February 2025

Published Date 28 March 2025

Abstract

Abstract

To address challenges such as regional mis-segmentation and insufficient target localization accuracy in colorectal polyp segmentation, this paper proposes a colorectal polyp segmentation algorithm that integrates adaptive feature selection based on a Transformer. Firstly, the Transformer encoder is employed to extract multi-level feature representations, capturing multi-scale information from fine-grained to high-level semantics. Secondly, a dual-focus attention module is designed to enhance feature representation and recognition capabilities by integrating multi-scale information, spatial attention, and local detail features, significantly improving the localization accuracy of lesion areas. Thirdly, a hierarchical feature fusion module is introduced, which adopts a hierarchical aggregation strategy to strengthen the fusion of local and global features, enhancing the capture of complex regional features and effectively reducing mis-segmentation. Finally, a dynamic feature selection module is incorporated with adaptive selection and weighting mechanisms to optimize multi-resolution feature representation, eliminate redundant information, and focus on key areas. Experiments conducted on the Kvasir, CVC-ClinicDB, CVC-ColonDB, and ETIS datasets achieved Dice coefficients of 0.926, 0.941, 0.814, and 0.797, respectively. The experimental results demonstrate that the proposed algorithm exhibits superior performance and application value in the task of colorectal polyp segmentation.
- colorectal polyps /
- Transformer /
- dual-focus attention module /
- dynamic feature selection module

FullText(HTML)

References

[1]	谢斌, 刘阳倩, 李俞霖. 结合极化自注意力和Transformer的结直肠息肉分割方法[J]. 光电工程, 2024, 51(10): 240179. doi: 10.12086/oee.2024.240179 CrossRef Google Scholar Xie B, Liu Y Q, Li Y L. Colorectal polyp segmentation method combining polarized self-attention and Transformer[J]. Opto-Electron Eng, 2024, 51(10): 240179. doi: 10.12086/oee.2024.240179 CrossRef Google Scholar
[2]	Lin L, Lv G Z, Wang B, et al. Polyp-LVT: polyp segmentation with lightweight vision transformers[J]. Knowledge-Based Syst, 2024, 300: 112181. doi: 10.1016/j.knosys.2024.112181 CrossRef Google Scholar
[3]	张艳, 马春明, 刘树东, 等. 基于多尺度特征增强的高效Transformer语义分割网络[J]. 光电工程, 2024, 51(12): 240237. doi: 10.12086/oee.2024.240237 CrossRef Google Scholar Zhang Y, Ma C M, Liu S D, et al. Multi-scale feature enhanced Transformer network for efficient semantic segmentation[J]. Opto-Electron Eng, 2024, 51(12): 240237. doi: 10.12086/oee.2024.240237 CrossRef Google Scholar
[4]	Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015: 234–241. https://doi.org/10.1007/978-3-319-24574-4_28. Google Scholar
[5]	Diakogiannis F I, Waldner F, Caccetta P, et al. ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data[J]. ISPRS J Photogramm Remote Sens, 2020, 162: 94−114. doi: 10.1016/j.isprsjprs.2020.01.013 CrossRef Google Scholar
[6]	Yin Z J, Liang K M, Ma Z Y, et al. Duplex contextual relation network for polyp segmentation[C]//Proceedings of 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), 2022: 1–5. https://doi.org/10.1109/ISBI52829.2022.9761402. Google Scholar
[7]	Lou A G, Guan S Y, Ko H, et al. CaraNet: context axial reverse attention network for segmentation of small medical objects[J]. Proc SPIE, 2022, 12032: 120320D. doi: 10.1117/12.2611802 CrossRef Google Scholar
[8]	Huang C H, Wu H Y, Lin Y L. HarDNet-MSEG: a simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 FPS[Z]. arXiv: 2101.07172, 2021. https://doi.org/10.48550/arXiv.2101.07172. Google Scholar
[9]	Shi W T, Xu J, Gao P. SSformer: a lightweight transformer for semantic segmentation[C]//Proceedings of 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), 2022: 1–5. https://doi.org/10.1109/MMSP55362.2022.9949177. Google Scholar
[10]	Wu C, Long C, Li S J, et al. MSRAformer: multiscale spatial reverse attention network for polyp segmentation[J]. Comput Biol Med, 2022, 151: 106274. doi: 10.1016/j.compbiomed.2022.106274 CrossRef Google Scholar
[11]	Wang W H, Xie E Z, Li X, et al. PVT v2: improved baselines with pyramid vision transformer[J]. Comput Visual Media, 2022, 8(3): 415−424. doi: 10.1007/s41095-022-0274-8 CrossRef Google Scholar
[12]	Ouyang D L, He S, Zhang G Z, et al. Efficient multi-scale attention module with cross-spatial learning[C]//Proceedings of ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023: 1–5. https://doi.org/10.1109/ICASSP49357.2023.10096516. Google Scholar
[13]	Zheng M J, Sun L, Dong J X, et al. SMFANet: a lightweight self-modulation feature aggregation network for efficient image super-resolution[C]//Proceedings of the 18th European Conference on Computer Vision, 2024: 359–375. https://doi.org/10.1007/978-3-031-72973-7_21. Google Scholar
[14]	Huo X Z, Sun G, Tian S W, et al. HiFuse: hierarchical multi-scale feature fusion network for medical image classification[J]. Biomed Signal Process Control, 2024, 87: 105534. doi: 10.1016/j.bspc.2023.105534 CrossRef Google Scholar
[15]	Chen X K, Lin K Y, Wang J B, et al. Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 561–577. https://doi.org/10.1007/978-3-030-58621-8_33. Google Scholar
[16]	Zhang Q L, Yang Y B. SA-Net: shuffle attention for deep convolutional neural networks[C]//Proceedings of ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: 2235–2239. https://doi.org/10.1109/ICASSP39728.2021.9414568. Google Scholar
[17]	Bernal J, Sánchez F J, Fernández-Esparrach G, et al. WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians[J]. Comput Med Imaging Graphics, 2015, 43: 99−111. doi: 10.1016/j.compmedimag.2015.02.007 CrossRef Google Scholar
[18]	Jha D, Smedsrud P H, Riegler M A, et al. Kvasir-SEG: a segmented polyp dataset[C]//Proceedings of the 26th International Conference on MultiMedia Modeling, 2020: 451–462. https://doi.org/10.1007/978-3-030-37734-2_37. Google Scholar
[19]	Tajbakhsh N, Gurudu S R, Liang J M. Automated polyp detection in colonoscopy videos using shape and context information[J]. IEEE Trans Med Imaging, 2016, 35(2): 630−644. doi: 10.1109/TMI.2015.2487997 CrossRef Google Scholar
[20]	Silva J, Histace A, Romain O, et al. Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer[J]. Int J Comput Assisted Radiol Surg, 2014, 9(2): 283−293. doi: 10.1007/s11548-013-0926-3 CrossRef Google Scholar
[21]	Dong B, Wang W H, Fan D P, et al. Polyp-PVT: polyp segmentation with pyramid vision transformers[Z]. arXiv: 2108.06932, 2024. https://doi.org/10.48550/arXiv.2108.06932. Google Scholar
[22]	李大湘, 李登辉, 刘颖, 等. 渐进式CNN-Transformer语义补偿息肉分割网络[J]. 光学精密工程, 2024, 32(16): 2523−2536. doi: 10.37188/OPE.20243216.2523 CrossRef Google Scholar Li D X, Li D H, Liu Y, et al. Progressive CNN-transformer semantic compensation network for polyp segmentation[J]. Opt Precis Eng, 2024, 32(16): 2523−2536. doi: 10.37188/OPE.20243216.2523 CrossRef Google Scholar

Overview

Overview

Colorectal cancer ranks among the most common and life-threatening diseases worldwide, with colorectal polyps identified as the primary precursors. Accurate detection and segmentation of polyps are essential for preventing cancer progression and improving patient outcomes. However, existing segmentation methods face persistent challenges, including regional mis-segmentation, low localization accuracy, and difficulties in capturing the complex features of polyps. To overcome these limitations, this study presents a novel colorectal polyp segmentation algorithm that integrates Transformer-based adaptive feature selection to improve segmentation accuracy and robustness.

The proposed approach utilizes a Transformer encoder to extract multi-level feature representations, capturing information from fine-grained details to high-level semantics. This enables a comprehensive understanding of the morphology of polyps and their surrounding tissues. To further improve feature representation, a dual-focus attention module is introduced, which integrates multi-scale information, spatial attention, and local detail features. This module enhances lesion localization accuracy and reduces errors arising from the complex structures of polyps.

To address regional mis-segmentation, a hierarchical feature fusion module is developed. By employing a hierarchical aggregation strategy, this module strengthens the integration of local and global features, allowing the model to better capture intricate regional characteristics. Additionally, a dynamic feature selection module is incorporated to optimize multi-resolution feature representations. Through adaptive selection and weighting mechanisms, this module eliminates redundant information and focuses on critical regions, improving segmentation precision.

In conclusion, this study demonstrates the effectiveness of integrating Transformer-based adaptive feature selection, dual-focus attention, hierarchical feature fusion, and dynamic feature optimization. The proposed algorithm provides a comprehensive and innovative solution to the challenges of colorectal polyp segmentation, offering significant potential for clinical applications in early cancer diagnosis and treatment.