A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network

Ding Junhua; Yuan Minghui

doi:10.12086/oee.2023.230242

Article navigation > Opto-Electronic Engineering > 2023 Vol. 50 > No. 12 > 230242

Next Article Previous Article

Ding J H, Yuan M H. A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network[J]. Opto-Electron Eng, 2023, 50(12): 230242. doi: 10.12086/oee.2023.230242

Citation:

Ding J H, Yuan M H. A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network[J]. Opto-Electron Eng, 2023, 50(12): 230242. doi: 10.12086/oee.2023.230242

A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network

Ding Junhua^1,2,,
Yuan Minghui^1,2, ,

1.
Terahertz Technology Innovation Research Institute, University of Shanghai for Science and Technology, Shanghai 200093, China
2.
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

Fund Project: Project supported by National Natural Science Foundation of China (61601291), and Shanghai Committee of Science and Technology (14dz1206602)

More Information

^*Corresponding author: yuanminghui@usst.edu.cn

Received Date 28 September 2023

Revised Date 30 November 2023

Accepted Date 30 November 2023

Published Date 19 January 2024

Abstract

Abstract

There are several major challenges in the detection and identification of contraband in millimetre-wave synthetic aperture radar (SAR) security imaging: the complexities of small target sizes, partially occluded targets and overlap between multiple targets, which are not conducive to the accurate identification of contraband. To address these problems, a contraband detection method based on dual branch multiscale fusion network (DBMFnet) is proposed. The overall architecture of the DBMFnet follows the encoder-decoder framework. In the encoder stage, a dual-branch parallel feature extraction network (DBPFEN) is proposed to enhance the feature extraction. In the decoder stage, a multi-scale fusion module (MSFM) is proposed to enhance the detection ability of the targets. The experimental results show that the proposed method outperforms the existing semantic segmentation methods in the mean intersection over union (mIoU) and reduces the incidence of missed and error detection of targets.
- millimetre-wave synthetic aperture radar /
- contraband detection /
- deep learning /
- semantic segmentation /
- dual-branch multi-scale fusion network

FullText(HTML)

References

[1]	Saadat M S, Sur S, Nelakuditi S, et al. MilliCam: hand-held millimeter-wave imaging[C]//Proceedings of 29th International Conference on Computer Communications and Networks, Honolulu, 2020: 1–9. https://doi.org/10.1109/ICCCN49398.2020.9209710. Google Scholar
[2]	Jing H D, Li S Y, Cui X X, et al. Near-field single-frequency millimeter-wave 3-D imaging via multifocus image fusion[J]. IEEE Antennas Wirel Propag Lett, 2021, 20(3): 298−302. doi: 10.1109/LAWP.2020.3048478 CrossRef Google Scholar
[3]	Nozokido T, Noto M, Murai T. Passive millimeter-wave microscopy[J]. IEEE Microw Wirel Compon Lett, 2009, 19(10): 638−640. doi: 10.1109/LMWC.2009.2029741 CrossRef Google Scholar
[4]	Appleby R, Anderton R N. Millimeter-wave and submillimeter-wave imaging for security and surveillance[J]. Proc IEEE, 2007, 95(8): 1683−1690. doi: 10.1109/JPROC.2007.898832 CrossRef Google Scholar
[5]	Işiker H, Ünal İ, Tekbaş M, et al. An auto‐classification procedure for concealed weapon detection in millimeter‐wave radiometric imaging systems[J]. Microw Opt Technol Lett, 2018, 60(3): 583−594. doi: 10.1002/mop.31005 CrossRef Google Scholar
[6]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 770–778. https://doi.org/10.1109/CVPR.2016.90. Google Scholar
[7]	Chollet F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 1251–1258. https://doi.org/10.1109/CVPR.2017.195. Google Scholar
[8]	Ren S Q, He K M, Girshick R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, 2015. Google Scholar
[9]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2. Google Scholar
[10]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 779–788. https://doi.org/10.1109/CVPR.2016.91. Google Scholar
[11]	Xie E Z, Wang W H, Yu Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021. Google Scholar
[12]	Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. https://doi.org/10.1109/CVPR.2017.660. Google Scholar
[13]	Chen L C, Zhu Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, 2018. https://doi.org/10.1007/978-3-030-01234-2_49. Google Scholar
[14]	Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. https://doi.org/10.1109/CVPR.2019.00584. Google Scholar
[15]	Pan H H, Hong Y D, Sun W C, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J]. IEEE Trans Intell Transp Syst, 2023, 24(3): 3448−3460. doi: 10.1109/TITS.2022.3228042 CrossRef Google Scholar
[16]	López-Tapia S, Molina R, de la Blanca N P. Deep CNNs for object detection using passive millimeter sensors[J]. IEEE Trans Circuits Syst Video Technol, 2019, 29(9): 2580−2589. doi: 10.1109/TCSVT.2017.2774927 CrossRef Google Scholar
[17]	Liu C Y, Yang M H, Sun X W. Towards robust human millimeter wave imaging inspection system in real time with deep learning[J]. Prog Electromagn Res, 2018, 161: 87−100. doi: 10.2528/PIER18012601 CrossRef Google Scholar
[18]	Sun P, Liu T, Chen X T, et al. Multi-source aggregation transformer for concealed object detection in millimeter-wave images[J]. IEEE Trans Circuits Syst Video Technol, 2022, 32(9): 6148−6159. doi: 10.1109/TCSVT.2022.3161815 CrossRef Google Scholar
[19]	王林华, 袁明辉, 黄慧, 等. 太赫兹安检系统人体图像边缘物体识别[J]. 红外与激光工程, 2017, 46(11): 1125002. doi: 10.3788/IRLA201746.1125002 CrossRef Google Scholar Wang L H, Yuan M H, Huang H, et al. Recognition of edge object of human body in THz security inspection system[J]. Infrared Laser Eng, 2017, 46(11): 1125002. doi: 10.3788/IRLA201746.1125002 CrossRef Google Scholar
[20]	Wang C J, Yang K H, Sun X W. Precise localization of concealed objects in millimeter-wave images via semantic segmentation[J]. IEEE Access, 2020, 8: 121246−121256. doi: 10.1109/ACCESS.2020.3007256 CrossRef Google Scholar
[21]	Liang D, Pan J X, Yu Y, et al. Concealed object segmentation in terahertz imaging via adversarial learning[J]. Optik, 2019, 185: 1104−1114. doi: 10.1016/j.ijleo.2019.04.034 CrossRef Google Scholar
[22]	Li X T, You A S, Zhu Z, et al. Semantic flow for fast and accurate scene parsing[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, 2020: 775–793. https://doi.org/10.1007/978-3-030-58452-8_45. Google Scholar

Overview

Overview

With the advancements of millimeter wave technology, millimeter wave security inspection systems have reached a higher level of maturity. Compared with traditional security inspection technologies such as X-ray, infrared, and metal detectors, millimeter wave security imaging not only enables the detection of the metallic objects hidden under fabrics, but also identifies dangerous items such as plastic firearms, knives, explosives, etc. Significantly, it is crucial to note that millimeter waves are non-ionizing and do not cause harm to the human body. The utilization of millimeter wave security inspection enables the acquisition of precise image information and significantly reduces the occurrence of false alarms, making millimeter wave imaging equipment extensively employed in the security inspection of the human body.

There are several major challenges in the detection and identification of contraband in millimetre-wave synthetic aperture radar (SAR) security imaging: the complexities of small target sizes, partially occluded targets and overlap between multiple targets, which are not conducive to the accurate identification of contraband. To address these problems, a contraband detection method based on Dual Branch Multiscale Fusion Network (DBMFnet) is proposed. The overall architecture of the DBMFnet follows the encoder-decoder framework. In the encoder stage, a dual-branch parallel feature extraction network (DBPFEN) is proposed to enhance the feature extraction. In the feature extraction process of DBMFnet, one branch preserves the high resolution while the other branch extracts the rich semantic information through multiple downsampling operations. Bilateral connections are established between high-resolution and low-resolution branches to facilitate repeated feature exchange, ensuring that the high-resolution branch feature maps integrate into the low-rate branch feature maps across different scales, which facilitates the combination of rich semantic information and fine-grained details to improve the detection of small and interfering targets in images. In the decoder stage, a multi-scale fusion module (MSFM) is proposed to enhance the detection ability of the targets. The module consists of the Feature Alignment Module (FAM), which allows multiple low-resolution feature maps to merge into high-resolution maps. The FAM is inspired by the optical flow for the motion alignment between adjacent video frames, where the feature maps F^h, F^lof different resolutions are used as the input and changed to the same number of channels by a 1×1 convolutional layer, respectively. Subsequently, the high-resolution feature map F^h is concatenated with the low-resolution feature map F^l by a bilinear interpolation up-sampling layer.

The experimental results show that when tested using the HM-SAR dataset, our proposed model improves mIoU by 2.54% compared to the existing best performing semantic segmentation models. The ablation experiment shows that the proposed MSFM can effectively improve the mIoU value.