Wu M J, Zhang Y A, Lin S L, et al. Real-time semantic segmentation algorithm based on BiLevelNet[J]. Opto-Electron Eng, 2024, 51(5): 240030. doi: 10.12086/oee.2024.240030
Citation: Wu M J, Zhang Y A, Lin S L, et al. Real-time semantic segmentation algorithm based on BiLevelNet[J]. Opto-Electron Eng, 2024, 51(5): 240030. doi: 10.12086/oee.2024.240030

Real-time semantic segmentation algorithm based on BiLevelNet

    Fund Project: Project supported by the National Key R&D Program of China (2023YFB3609400), Fujian Province Natural Science Foundation of China (2020J01468), and Youth Science Foundation of the National Natural Science Foundation of China (62101132)
More Information
  • In response to the problem of the large parameter size of semantic segmentation networks, making it difficult to deploy on memory-constrained edge devices, a lightweight real-time semantic segmentation algorithm is proposed based on BiLevelNet. Firstly, dilated convolutions are employed to augment the receptive field, and feature reuse strategies are integrated to enhance the network's region awareness. Next, a two-stage PBRA (Partial Bi-Level Route Attention) mechanism is incorporated to establish dependencies between distant objects, thereby augmenting the network's global perception capability. Finally, the FADE operator is introduced to combine shallow features to improve the effectiveness of image upsampling. Experimental results show that, at an input image resolution of 512×1024, the proposed network achieves an average Intersection over Union (IoU) of 75.1% on the Cityscapes dataset at a speed of 121 frames per second, with a model size of only 0.7 M. Additionally, at an input image resolution of 360×480, the network achieves an average IoU of 68.2% on the CamVid dataset. Compared with other real-time semantic segmentation methods, this network achieves a balance between speed and accuracy, meeting the real-time requirements for applications like autonomous driving.
  • 加载中
  • [1] Li L H, Qian B, Lian J, et al. Traffic scene segmentation based on RGB-D image and deep learning[J]. IEEE Trans Intell Transp Syst, 2017, 19(5): 1664−1669. doi: 10.1109/TITS.2017.2724138

    CrossRef Google Scholar

    [2] 梁礼明, 卢宝贺, 龙鹏威, 等. 自适应特征融合级联Transformer视网膜血管分割算法[J]. 光电工程, 2023, 50(10): 230161. doi: 10.12086/oee.2023.230161

    CrossRef Google Scholar

    Liang L M, Lu B H, Long P W, et al. Adaptive feature fusion cascade transformer retinal vessel segmentation algorithm[J]. Opto-Electron Eng, 2023, 50(10): 230161. doi: 10.12086/oee.2023.230161

    CrossRef Google Scholar

    [3] 闵锋, 彭伟明, 况永刚, 等. 基于非下采样轮廓波变换的遥感地物分割算法[J]. 电光与控制, 2023, 30(11): 49−55. doi: 10.3969/j.issn.1671-637X.2023.11.008

    CrossRef Google Scholar

    Min F, Peng W M, Kuang Y G, et al. A remote sensing ground object segmentation algorithm based on non-subsampled contourlet transform[J]. Electron Opt Control, 2023, 30(11): 49−55. doi: 10.3969/j.issn.1671-637X.2023.11.008

    CrossRef Google Scholar

    [4] Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2881–2890. https://doi.org/10.1109/CVPR.2017.660.

    Google Scholar

    [5] 张文博, 瞿珏, 王崴, 等. 融合多尺度特征的改进Deeplab v3+图像语义分割算法[J]. 电光与控制, 2022, 29(11): 12−16,30. doi: 10.3969/j.issn.1671-637X.2022.11.003

    CrossRef Google Scholar

    Zhang W B, Qu J, Wang W, et al. An improved Deeplab v3+ image semantic segmentation algorithm incorporating multi-scale features[J]. Electron Opt Control, 2022, 29(11): 12−16,30. doi: 10.3969/j.issn.1671-637X.2022.11.003

    CrossRef Google Scholar

    [6] Howard A, Sandler M, Chen B, et al. Searching for MobileNetV3[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 1314–1324. https://doi.org/10.1109/ICCV.2019.00140.

    Google Scholar

    [7] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3213–3223. https://doi.org/10.1109/CVPR.2016.350.

    Google Scholar

    [8] Brostow G J, Fauqueur J, Cipolla R. Semantic object classes in video: a high-definition ground truth database[J]. Pattern Recognit Lett, 2009, 30(2): 88−97. doi: 10.1016/j.patrec.2008.04.005

    CrossRef Google Scholar

    [9] Yu C Q, Gao C X, Wang J B, et al. BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation[J]. Int J Comput Vis, 2021, 129(11): 3051−3068. doi: 10.1007/s11263-021-01515-2

    CrossRef Google Scholar

    [10] Zhuang M X, Zhong X Y, Gu D B, et al. LRDNet: a lightweight and efficient network with refined dual attention decorder for real-time semantic segmentation[J]. Neurocomputing, 2021, 459: 349−360. doi: 10.1016/j.neucom.2021.07.019

    CrossRef Google Scholar

    [11] Romera E, Álvarez J M, Bergasa L M, et al. ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation[J]. IEEE Trans Intell Transp Syst, 2018, 19(1): 263−272. doi: 10.1109/TITS.2017.2750080

    CrossRef Google Scholar

    [12] Liu J, Zhou Q, Qiang Y, et al. FDDWNet: a lightweight convolutional neural network for real-time semantic segmentation[C]//Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing, 2020: 2373–2377. https://doi.org/10.1109/ICASSP40776.2020.9053838.

    Google Scholar

    [13] Liu J, Xu X Q, Shi Y Q, et al. RELAXNet: residual efficient learning and attention expected fusion network for real-time semantic segmentation[J]. Neurocomputing, 2022, 474: 115−127. doi: 10.1016/j.neucom.2021.12.003

    CrossRef Google Scholar

    [14] 林珊玲, 彭雪玲, 林坚普, 等. 多尺度增强特征融合的钢表面缺陷目标检测[J]. 光学精密工程, 2024, 32(7): 1076−1086. doi: 10.37188/OPE.20243207.1075

    CrossRef Google Scholar

    Lin S L, Peng X L, Lin J P, et al. Object detection of steel surface defect based on multi-scale enhanced feature fusion[J]. Opt Precision Eng, 2024, 32(7): 1076−1086. doi: 10.37188/OPE.20243207.1075

    CrossRef Google Scholar

    [15] Wang Y, Zhou Q, Liu J, et al. Lednet: a lightweight encoder-decoder network for real-time semantic segmentation[C]//Proceedings of 2019 IEEE International Conference on Image Processing, 2019: 1860–1864. https://doi.org/10.1109/ICIP.2019.8803154.

    Google Scholar

    [16] Wei H R, Liu X, Xu S C, et al. DWRSeg: dilation-wise residual network for real-time semantic segmentation[Z]. arXiv: 2212.01173, 2023. https://arxiv.org/abs/2212.01173v1.

    Google Scholar

    [17] Chen J R, Kao S H, He H, et al. Run, don't walk: chasing higher FLOPS for faster neural networks[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 12021–12031. https://doi.org/10.1109/CVPR52729.2023.01157.

    Google Scholar

    [18] Ma N N, Zhang X Y, Zheng H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 116–131. https://doi.org/10.1007/978-3-030-01264-9_8.

    Google Scholar

    [19] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 3–19. https://doi.org/10.1007/978-3-030-01234-2_1.

    Google Scholar

    [20] 张冲, 黄影平, 郭志阳, 等. 基于语义分割的实时车道线检测方法[J]. 光电工程, 2022, 49(5): 210378. doi: 10.12086/oee.2022.210378

    CrossRef Google Scholar

    Zhang C, Huang Y P, Guo Z Y, et al. Real-time lane detection method based on semantic segmentation[J]. Opto-Electron Eng, 2022, 49(5): 210378. doi: 10.12086/oee.2022.210378

    CrossRef Google Scholar

    [21] Huang Z L, Wang X G, Huang L C, et al. CCNet: criss-cross attention for semantic segmentation[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 603–612. https://doi.org/10.1109/ICCV.2019.00069.

    Google Scholar

    [22] 吴刚, 葛芸, 储珺, 等. 面向遥感图像检索的级联池化自注意力研究[J]. 光电工程, 2022, 49(12): 220029. doi: 10.12086/oee.2022.220029

    CrossRef Google Scholar

    Wu G, Ge Y, Chu J, et al. Cascade pooling self-attention research for remote sensing image retrieval[J]. Opto-Electron Eng, 2022, 49(12): 220029. doi: 10.12086/oee.2022.220029

    CrossRef Google Scholar

    [23] Xia Z F, Pan X R, Song S J, et al. Vision transformer with deformable attention[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 4794–4803. https://doi.org/10.1109/CVPR52688.2022.00475.

    Google Scholar

    [24] Zhu L, Wang X J, Ke Z H, et al. BiFormer: vision transformer with Bi-level routing attention[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10323–10333. https://doi.org/10.1109/CVPR52729.2023.00995.

    Google Scholar

    [25] Wang J Q, Chen K, Xu R, et al. CARAFE: content-aware ReAssembly of FEatures[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 3007–3016. https://doi.org/10.1109/ICCV.2019.00310.

    Google Scholar

    [26] 刘春娟, 乔泽, 闫浩文, 等. 基于多尺度互注意力的遥感图像语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(7): 1335−1344. doi: 10.3785/j.issn.1008-973X.2023.07.008

    CrossRef Google Scholar

    Liu C J, Qiao Z, Yan H W, et al. Semantic segmentation network for remote sensing image based on multi-scale mutual attention[J]. J Zhejiang Univ (Eng Sci), 2023, 57(7): 1335−1344. doi: 10.3785/j.issn.1008-973X.2023.07.008

    CrossRef Google Scholar

    [27] Lu H, Liu W Z, Fu H T, et al. FADE: fusing the assets of decoder and encoder for task-agnostic upsampling[C]//Proceedings of the 17th European Conference on Computer Vision, 2022: 231–247. https://doi.org/10.1007/978-3-031-19812-0_14.

    Google Scholar

    [28] Li H C, Xiong P F, Fan H Q, et al. DFANet: deep feature aggregation for real-time semantic segmentation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9522–9531. https://doi.org/10.1109/CVPR.2019.00975.

    Google Scholar

    [29] Yi Q M, Dai G S, Shi M, et al. ELANet: effective lightweight attention-guided network for real-time semantic segmentation[J]. Neural Process Lett, 2023, 55(5): 6425−6442. doi: 10.1007/s11063-023-11145-z

    CrossRef Google Scholar

    [30] 石敏, 沈佳林, 易清明, 等. 快速超轻量城市交通场景语义分割[J]. 计算机科学与探索, 2022, 16(10): 2377−2386. doi: 10.3778/j.issn.1673-9418.2203015

    CrossRef Google Scholar

    Shi M, Shen J L, Yi Q M, et al. Rapid and ultra-lightweight semantic segmentation in urban traffic scene[J]. J Front Comput Sci Technol, 2022, 16(10): 2377−2386. doi: 10.3778/j.issn.1673-9418.2203015

    CrossRef Google Scholar

    [31] 易清明, 张文婷, 石敏, 等. 多尺度特征融合的道路场景语义分割[J]. 激光与光电子学进展, 2023, 60(12): 1210006. doi: 10.3788/LOP220914

    CrossRef Google Scholar

    Yi Q M, Zhang W T, Shi M, et al. Semantic segmentation for road scene based on multiscale feature fusion[J]. Laser Optoelectron Prog, 2023, 60(12): 1210006. doi: 10.3788/LOP220914

    CrossRef Google Scholar

    [32] 兰建平, 董冯雷, 杨亚会, 等. 改进STDC-Seg的实时图像语义分割网络算法[J]. 传感器与微系统, 2023, 42(11): 110−113,118. doi: 10.13873/J.1000-9787(2023)11-0110-04

    CrossRef Google Scholar

    Lan J P, Dong F L, Yang Y H, et al. Real-time image semantic segmentation network algorithm based on improved STDC-Seg[J]. Transducer Microsyst Technol, 2023, 42(11): 110−113,118. doi: 10.13873/J.1000-9787(2023)11-0110-04

    CrossRef Google Scholar

  • In response to the challenge posed by the large parameter sizes of semantic segmentation networks, which complicate deployment on memory-constrained edge devices, a lightweight real-time semantic segmentation algorithm based on BiLevelNet is proposed. Initially, dilated convolutions are utilized to broaden the receptive field, and strategies for reusing features are integrated to bolster the network's awareness of regions. Subsequently, a two-stage PBRA (Partial Bi-Level Route Attention) mechanism is adopted to form connections between distant objects, thereby enhancing the network's capability to perceive global contexts. Moreover, the FADE operator is introduced for merging shallow features, thereby augmenting the efficacy of image upsampling.

    Within the depicted AFR module in Fig. 4, a variety of hierarchical feature maps are presented, along with descriptions of their characteristics and roles. The distinctions and connections between the input feature map, the local feature map achieved through 3×3 depth convolution, and the context information feature map acquired through dilated convolution are clarified. It is further emphasized how these features are effectively amalgamated in the final fused feature map, showcasing strong activation across both local and global contexts. Additionally, a gradually decreasing channel reduction factor is employed, as elaborated in Table 3. Through the gradual adjustment of the channel reduction factor, it is observed that with a reduction factor of r=1/4, the PBRA module enhances mIoU by 1.5% and boosts speed by 12FPS in comparison to BRA.

    Moreover, discontinuities and missing pixels are noted in segmentation results when bilinear interpolation is used for upsampling. Observations of the depth feature maps prior to bilinear upsampling reveal that features corresponding to roads and sidewalks bear similarities, leading to potential misclassifications. To counteract this issue, shallow features that preserve edge information are introduced and merged into the FADE upsampling process, thereby improving edge segmentation. This method effectively addresses the loss of spatial information, resulting in smoother and more defined edge segmentation outcomes.

    Experimental outcomes indicate that, at an input image resolution of 512×1024, the network attains an average Intersection over Union (IoU) of 75.1% on the Cityscapes dataset, operating at a speed of 121 frames per second, while maintaining a modest model size of only 0.7M. Furthermore, at an input image resolution of 360×480, the network secures an average IoU of 68.2% on the CamVid dataset. Compared with other real-time semantic segmentation methods, this network maintains an optimal balance between speed and accuracy, fulfilling the real-time operation requirements for applications such as autonomous driving.

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(15)

Tables(7)

Article Metrics

Article views(842) PDF downloads(232) Cited by(0)

Access History
Article Contents

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint