Citation: | Li S B, Xiao Z J, Qu H C, et al. Multi-granularity feature and shape-position similarity metric method for ship detection in SAR images[J]. Opto-Electron Eng, 2025, 52(2): 240254. doi: 10.12086/oee.2025.240254 |
[1] | Ai J Q, Yang X Z, Song J T, et al. An adaptively truncated clutter-statistics-based two-parameter CFAR detector in SAR imagery[J]. IEEE J Oceanic Eng, 2018, 43(1): 267−279. doi: 10.1109/joe.2017.2768198 |
[2] | Eldhuset K. An automatic ship and ship wake detection system for spaceborne SAR images in coastal regions[J]. IEEE Trans Geosci Remote Sens, 1996, 34(4): 1010−1019. doi: 10.1109/36.508418 |
[3] | Liu T, Zhang J F, Gao G, et al. CFAR ship detection in polarimetric synthetic aperture radar images based on whitening filter[J]. IEEE Trans Geosci Remote Sens, 2020, 58(1): 58−81. doi: 10.1109/tgrs.2019.2931353 |
[4] | Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015: 1440–1448. https://doi.org/10.1109/iccv.2015.169. |
[5] | Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/tpami.2016.2577031 |
[6] | Zhang H K, Chang H, Ma B P, et al. Dynamic R-CNN: towards high quality object detection via dynamic training[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, 2020: 260–275. https://doi.org/10.1007/978-3-030-58555-6_16. |
[7] | Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2. |
[8] | Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016: 779–788. https://doi.org/10.1109/cvpr.2016.91. |
[9] | Tian Z, Shen C H, Chen H, et al. FCOS: a simple and strong anchor-free object detector[J]. IEEE Trans Pattern Anal Mach Intell, 2022, 44(4): 1922−1933. doi: 10.1109/tpami.2020.3032166 |
[10] | Guo H Y, Yang X, Wang N N, et al. A CenterNet++ model for ship detection in SAR images[J]. Pattern Recognit, 2021, 112: 107787. doi: 10.1016/j.patcog.2020.107787 |
[11] | 徐安林, 杜丹, 王海红, 等. 结合层次化搜索与视觉残差网络的光学舰船目标检测方法[J]. 光电工程, 2021, 48(4): 200249. doi: 10.12086/oee.2021.200249 Xu A L, Du D, Wang H H, et al. Optical ship target detection method combining hierarchical search and visual residual network[J]. Opto-Electron Eng, 2021, 48(4): 200249. doi: 10.12086/oee.2021.200249 |
[12] | Yang X, Zhang X, Wang N N, et al. A robust one-stage detector for multiscale ship detection with complex background in massive SAR images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5217712. doi: 10.1109/tgrs.2021.3128060 |
[13] | 丁俊华, 袁明辉. 基于双分支多尺度融合网络的毫米波SAR图像多目标语义分割方法[J]. 光电工程, 2023, 50(12): 230242. doi: 10.12086/oee.2023.230242 Ding J H, Yuan M H. A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network[J]. Opto-Electron Eng, 2023, 50(12): 230242. doi: 10.12086/oee.2023.230242 |
[14] | Bai L, Yao C, Ye Z, et al. Feature enhancement pyramid and shallow feature reconstruction network for SAR ship detection[J]. IEEE J Sel Top Appl Earth Obs Remote Sens, 2023, 16: 1042−1056. doi: 10.1109/jstars.2022.3230859 |
[15] | Tang J, Han Y H, Xian Y T. SAR-ShipSwin: enhancing SAR ship detection with robustness in complex environment[J]. J Supercomput, 2024, 80(14): 2079−20812. doi: 10.1007/s11227-024-06237-z |
[16] | Cao S, Zhao C X, Dong J, et al. Ship detection in synthetic aperture radar images under complex geographical environments, based on deep learning and morphological networks[J]. Sensors, 2024, 24(13): 4290. doi: 10.3390/s24134290 |
[17] | Zhang T W, Zhang X L, Ke X. Quad-FPN: a novel quad feature pyramid network for SAR ship detection[J]. Remote Sens, 2021, 13(14): 2771. doi: 10.3390/rs13142771 |
[18] | Tang G, Zhao H R, Claramunt C, et al. PPA-Net: pyramid pooling attention network for multi-scale ship detection in SAR images[J]. Remote Sens, 2023, 15(11): 2855. doi: 10.3390/rs15112855 |
[19] | 袁志安, 谷雨, 马淦. 面向多类别舰船多目标跟踪的改进CSTrack算法[J]. 光电工程, 2023, 50(12): 230218. doi: 10.12086/oee.2023.230218 Yuan Z A, Gu Y, Ma G. Improved CSTrack algorithm for multi-class ship multi-object tracking[J]. Opto-Electron Eng, 2023, 50(12): 230218. doi: 10.12086/oee.2023.230218 |
[20] | Li X N, Chen P, Yang J S, et al. TKP-net: a three keypoint detection network for ships using SAR imagery[J]. IEEE J Sel Top Appl Earth Obs Remote Sens, 2024, 17: 364−376. doi: 10.1109/jstars.2023.3329252 |
[21] | Tang Y, Wang S G, Wei J, et al. Scene-aware data augmentation for ship detection in SAR images[J]. Int J Remote Sens, 2024, 45(10): 3396−3411. doi: 10.1080/01431161.2024.2343433 |
[22] | Chen S W, Mei L Y. Structure similarity virtual map generation network for optical and SAR image matching[J]. Front Phys, 2024, 12: 1287050. doi: 10.3389/fphy.2024.1287050 |
[23] | Li Y X, Li X, Li W J, et al. SARDet-100k: towards open-source benchmark and toolkit for large-scale SAR object detection[Z]. arXiv: 2403.06534, 2024. https://doi.org/10.48550/arxiv.2403.06534. |
[24] | Khanam R, Hussain M. YOLOv11: an overview of the key architectural enhancements[Z]. arXiv: 2410.17725, 2024. https://doi.org/10.48550/arXiv.2410.17725. |
[25] | Feng C J, Zhong Y J, Gao Y, et al. TOOD: task-aligned one-stage object detection[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 2021: 3490–3499. https://doi.org/10.1109/ICCV48922.2021.00349. |
[26] | Li J F, Wen Y, He L H. SCConv: spatial and channel reconstruction convolution for feature redundancy[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 2023: 6153–6162. https://doi.org/10.1109/CVPR52729.2023.00596. |
[27] | Zhang T W, Zhang X L, Li J W, et al. SAR ship detection dataset (SSDD): official release and comprehensive data analysis[J]. Remote Sens, 2021, 13(18): 3690. doi: 10.3390/rs13183690 |
[28] | Wei S J, Zeng X F, Qu Q Z, et al. HRSID: a high-resolution SAR images dataset for ship detection and instance segmentation[J]. IEEE Access, 2020, 8: 120234−120254. doi: 10.1109/access.2020.3005861 |
[29] | Zheng Z H, Wang P, Liu W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, New York, 2020: 12993–13000. https://doi.org/10.1609/aaai.v34i07.6999. |
[30] | Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019: 658–666. https://doi.org/10.1109/cvpr.2019.00075. |
[31] | Zheng Z H, Wang P, Ren D W, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Trans Cybern, 2022, 52(8): 8574−8586. doi: 10.1109/TCYB.2021.3095305 |
[32] | Gevorgyan Z. SIoU loss: more powerful learning for bounding box regression[Z]. arXiv: 2205.12740, 2022. https://doi.org/10.48550/arxiv.2205.12740. |
[33] | Ma S L, Xu Y. MPDIoU: a loss for efficient and accurate bounding box regression[Z]. arXiv: 2307.07662, 2023. https://doi.org/10.48550/arxiv.2307.07662. |
[34] | Liu C, Wang K G, Li Q, et al. Powerful-IoU: more straightforward and faster bounding box regression loss with a nonmonotonic focusing mechanism[J]. Neural Networks, 2024, 170: 276−284. doi: 10.1016/j.neunet.2023.11.041 |
[35] | Sun P Z, Zhang R F, Jiang Y, et al. Sparse R-CNN: end-to-end object detection with learnable proposals[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 14449–14458. https://doi.org/10.1109/CVPR46437.2021.01422. |
[36] | Lyu C Q, Zhang W W, Huang H A, et al. RTMDet: an empirical study of designing real-time object detectors[Z]. arXiv: 2212.07784, 2022. https://doi.org/10.48550/ARXIV.2212.07784. |
[37] | Jocher G, Qiu J, Chaurasia A. YOLOv8 by Ultralytics[EB/OL]//GitHub. (2023-01-01)[2023-12-17]. https://github.com/ultralytics/ultralytics. |
[38] | Wang Z Y, Li C, Xu H Y, et al. Mamba YOLO: a simple baseline for object detection with state space model[Z]. arXiv preprint arXiv: 2406.05835, 2024. https://doi.org/10.48550/arxiv.2406.05835. |
[39] | Zhang S L, Wang X J, Wang J Q, et al. Dense distinct query for end-to-end object detection[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 2023: 7329–7338. https://doi.org/10.1109/cvpr52729.2023.00708. |
[40] | Zhang H, Li F, Liu S L, et al. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection[C]//Proceedings of the 11th International Conference on Learning Representations, Kigali, 2022: 1–19. |
[41] | Zhao Y, Lv W Y, Xu S L, et al. DETRs beat YOLOs on real-time object detection[C]//Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2024: 16965–16974. https://doi.org/10.1109/CVPR52733.2024.01605. |
Synthetic aperture radar (SAR) images have all-weather and all-day observation capabilities, can penetrate clouds, rain, and fog, provide high-resolution images, and achieve large-scale coverage and fast scanning. Based on these advantages, SAR images have been widely used in ship detection, such as maritime patrol, search and rescue, fishery supervision, traffic management, and military surveillance. However, in actual detection, due to factors such as complex background interference, large changes in target scale, and dense distribution of small targets, missed detections and false detections are prone to occur. This paper proposes a multi-granularity feature and shape-position similarity method for ship detection in SAR images to meet these challenges and improve the detection effect. In the feature extraction stage, a dual-branch multi-granularity feature aggregation (MGFA) structure is designed to cope with the complex background and multi-scale targets in SAR images. MGFA uses two branches, coarse-grained and fine-grained, for bidirectional feature extraction. The coarse-grained feature extraction branch is implemented through the Haar wavelet transform, which can decompose the input features at multiple levels, thereby expanding the receptive field of the model and enhancing the ability to capture global context information. The fine-grained feature extraction branch uses spatial and channel reconstruction convolution (SCConv), which focuses more on the extraction of details and local information. SCConv can efficiently capture the detailed texture of small targets, reduce the influence of clutter interference and background noise, and thus improve the accuracy of small target detection. In the detection regression stage, the shape-position similarity (SPS) metric is proposed using Euclidean distance to replace the traditional IoU. For small ships, SPS solves the sensitivity problem of position deviation through center distance, and can accurately reflect the position deviation even when there is no overlap in the bounding box. For large ships, SPS introduces shape constraints, enhances the focus on the shape of the prediction box, and makes up for the deficiency of IoU focusing only on position overlap. By considering the similarity of position and shape, SPS effectively addresses the problem of ship detection in SAR images with dense small targets and significant scale changes. In a comprehensive comparison with 11 detectors from one-stage, two-stage, and DETR series on the SSDD and HRSID datasets, our method achieves mAP scores of 68.8% and 98.3%, and mAP50 scores of 70.8% and 93.8%, respectively. In addition, our model is highly efficient, with just 2.4 M parameters and a computational load of only 6.4 GFLOPs, outperforming the comparison methods.
Diagram of the overall network structure of the proposed method
MGFA network structure diagram
Increase the receptive field with HWT
Two-level HWT decomposition process in a single channel
Changes in receptive field of the model before and after using HWT. Receptive fields of (a) Stage2, (b) Stage3, (c) Stage4, and (d) Stage5 before HWT; Receptive fields of (e) Stage2, (f) Stage3, (g) Stage4, and (h) Stage5 after HWT
IoU changes of ships with different sizes. (a) Changes in IoU for small ship; (b) Changes in IoU for large ship
Simulation comparison of different metrics under different deviations. (a) Deviation; (b) IoU deviation curves; (c) SPS deviation curves
SSDD and HRSID ship target distributions. (a) SSDD; (b) HRSID
Changes in regression loss, accuracy, and recall rates of the model before and after using SPS. (a) Regression loss; (b) Precision; (c) Recall
Comparison of PR curves of different methods
Comparison of performances of different methods
Visual comparison of different methods on SSDD dataset. (a) True labeling; (b) Dynamic R-cnn; (c) YOLOv8n; (d) YOLO11n; (e) Mamba YOLO; (f) Ours; (g) DINO; (h) RT-DETR
Visual comparison of different methods on HRSID dataset. (a) True labeling; (b) Dynamic R-CNN; (c) YOLOv8n; (d) YOLO11n; (e) Mamba YOLO; (f) Ours; (g) DINO; (h) RT-DETR