Citation: | Ding J H, Yuan M H. A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network[J]. Opto-Electron Eng, 2023, 50(12): 230242. doi: 10.12086/oee.2023.230242 |
[1] | Saadat M S, Sur S, Nelakuditi S, et al. MilliCam: hand-held millimeter-wave imaging[C]//Proceedings of 29th International Conference on Computer Communications and Networks, Honolulu, 2020: 1–9. https://doi.org/10.1109/ICCCN49398.2020.9209710. |
[2] | Jing H D, Li S Y, Cui X X, et al. Near-field single-frequency millimeter-wave 3-D imaging via multifocus image fusion[J]. IEEE Antennas Wirel Propag Lett, 2021, 20(3): 298−302. doi: 10.1109/LAWP.2020.3048478 |
[3] | Nozokido T, Noto M, Murai T. Passive millimeter-wave microscopy[J]. IEEE Microw Wirel Compon Lett, 2009, 19(10): 638−640. doi: 10.1109/LMWC.2009.2029741 |
[4] | Appleby R, Anderton R N. Millimeter-wave and submillimeter-wave imaging for security and surveillance[J]. Proc IEEE, 2007, 95(8): 1683−1690. doi: 10.1109/JPROC.2007.898832 |
[5] | Işiker H, Ünal İ, Tekbaş M, et al. An auto‐classification procedure for concealed weapon detection in millimeter‐wave radiometric imaging systems[J]. Microw Opt Technol Lett, 2018, 60(3): 583−594. doi: 10.1002/mop.31005 |
[6] | He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 770–778. https://doi.org/10.1109/CVPR.2016.90. |
[7] | Chollet F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 1251–1258. https://doi.org/10.1109/CVPR.2017.195. |
[8] | Ren S Q, He K M, Girshick R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, 2015. |
[9] | Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2. |
[10] | Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 779–788. https://doi.org/10.1109/CVPR.2016.91. |
[11] | Xie E Z, Wang W H, Yu Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021. |
[12] | Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. https://doi.org/10.1109/CVPR.2017.660. |
[13] | Chen L C, Zhu Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, 2018. https://doi.org/10.1007/978-3-030-01234-2_49. |
[14] | Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. https://doi.org/10.1109/CVPR.2019.00584. |
[15] | Pan H H, Hong Y D, Sun W C, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J]. IEEE Trans Intell Transp Syst, 2023, 24(3): 3448−3460. doi: 10.1109/TITS.2022.3228042 |
[16] | López-Tapia S, Molina R, de la Blanca N P. Deep CNNs for object detection using passive millimeter sensors[J]. IEEE Trans Circuits Syst Video Technol, 2019, 29(9): 2580−2589. doi: 10.1109/TCSVT.2017.2774927 |
[17] | Liu C Y, Yang M H, Sun X W. Towards robust human millimeter wave imaging inspection system in real time with deep learning[J]. Prog Electromagn Res, 2018, 161: 87−100. doi: 10.2528/PIER18012601 |
[18] | Sun P, Liu T, Chen X T, et al. Multi-source aggregation transformer for concealed object detection in millimeter-wave images[J]. IEEE Trans Circuits Syst Video Technol, 2022, 32(9): 6148−6159. doi: 10.1109/TCSVT.2022.3161815 |
[19] | 王林华, 袁明辉, 黄慧, 等. 太赫兹安检系统人体图像边缘物体识别[J]. 红外与激光工程, 2017, 46(11): 1125002. doi: 10.3788/IRLA201746.1125002 Wang L H, Yuan M H, Huang H, et al. Recognition of edge object of human body in THz security inspection system[J]. Infrared Laser Eng, 2017, 46(11): 1125002. doi: 10.3788/IRLA201746.1125002 |
[20] | Wang C J, Yang K H, Sun X W. Precise localization of concealed objects in millimeter-wave images via semantic segmentation[J]. IEEE Access, 2020, 8: 121246−121256. doi: 10.1109/ACCESS.2020.3007256 |
[21] | Liang D, Pan J X, Yu Y, et al. Concealed object segmentation in terahertz imaging via adversarial learning[J]. Optik, 2019, 185: 1104−1114. doi: 10.1016/j.ijleo.2019.04.034 |
[22] | Li X T, You A S, Zhu Z, et al. Semantic flow for fast and accurate scene parsing[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, 2020: 775–793. https://doi.org/10.1007/978-3-030-58452-8_45. |
With the advancements of millimeter wave technology, millimeter wave security inspection systems have reached a higher level of maturity. Compared with traditional security inspection technologies such as X-ray, infrared, and metal detectors, millimeter wave security imaging not only enables the detection of the metallic objects hidden under fabrics, but also identifies dangerous items such as plastic firearms, knives, explosives, etc. Significantly, it is crucial to note that millimeter waves are non-ionizing and do not cause harm to the human body. The utilization of millimeter wave security inspection enables the acquisition of precise image information and significantly reduces the occurrence of false alarms, making millimeter wave imaging equipment extensively employed in the security inspection of the human body.
There are several major challenges in the detection and identification of contraband in millimetre-wave synthetic aperture radar (SAR) security imaging: the complexities of small target sizes, partially occluded targets and overlap between multiple targets, which are not conducive to the accurate identification of contraband. To address these problems, a contraband detection method based on Dual Branch Multiscale Fusion Network (DBMFnet) is proposed. The overall architecture of the DBMFnet follows the encoder-decoder framework. In the encoder stage, a dual-branch parallel feature extraction network (DBPFEN) is proposed to enhance the feature extraction. In the feature extraction process of DBMFnet, one branch preserves the high resolution while the other branch extracts the rich semantic information through multiple downsampling operations. Bilateral connections are established between high-resolution and low-resolution branches to facilitate repeated feature exchange, ensuring that the high-resolution branch feature maps integrate into the low-rate branch feature maps across different scales, which facilitates the combination of rich semantic information and fine-grained details to improve the detection of small and interfering targets in images. In the decoder stage, a multi-scale fusion module (MSFM) is proposed to enhance the detection ability of the targets. The module consists of the Feature Alignment Module (FAM), which allows multiple low-resolution feature maps to merge into high-resolution maps. The FAM is inspired by the optical flow for the motion alignment between adjacent video frames, where the feature maps Fh, Flof different resolutions are used as the input and changed to the same number of channels by a 1×1 convolutional layer, respectively. Subsequently, the high-resolution feature map Fh is concatenated with the low-resolution feature map Fl by a bilinear interpolation up-sampling layer.
The experimental results show that when tested using the HM-SAR dataset, our proposed model improves mIoU by 2.54% compared to the existing best performing semantic segmentation models. The ablation experiment shows that the proposed MSFM can effectively improve the mIoU value.
DBMFnet network structure diagram
Feature fusion process
Different feature fusion methods. (a) FCM; (b) FDM; (c) MSFM
HM-SAR security images. (a) Back scanning image of the human body; (b) Frontal scanning image of the human body
DBMFnet thermal diagram
Test results of each model. Each row represents the test results of the same picture, and each column represents the test results of the same model. Black denotes the background, green denotes the wrench, yellow denotes the pistol, red denotes the hammer, and blue denotes the knife
Baseline model