Citation: | Liang L M, Chen K Q, Wang C B, et al. Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception[J]. Opto-Electron Eng, 2024, 51(7): 240099. doi: 10.12086/oee.2024.240099 |
[1] | 马梁, 苟于涛, 雷涛, 等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363 Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363 |
[2] | 袁金豪, 张南峰, 阮洁珊, 等. 基于改进YOLOX算法的X射线图像违禁品检测方法[J]. 激光技术, 2023, 47(4): 547−552. doi: 10.7510/jgjs.issn.1001-3806.2023.04.016 Yuan J H, Zhang N F, Ruan J S, et al. Detection of prohibited items in X-ray images based on modified YOLOX algorithm[J]. Laser Technol, 2023, 47(4): 547−552. doi: 10.7510/jgjs.issn.1001-3806.2023.04.016 |
[3] | Ming Q, Miao L J, Zhou Z Q, et al. CFC-Net: a critical feature capturing network for arbitrary-oriented object detection in remote-sensing images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5605814. doi: 10.1109/TGRS.2021.3095186 |
[4] | Cong R M, Zhang Y M, Fang L Y, et al. RRNet: relational reasoning network with parallel multiscale attention for salient object detection in optical remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5613311. doi: 10.1109/TGRS.2021.3123984 |
[5] | Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014: 580–587. https://doi.org/10.1109/CVPR.2014.81. |
[6] | Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169. |
[7] | Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031 |
[8] | He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 2961–2969. https://doi.org/10.1109/ICCV.2017.322. |
[9] | Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference, Amsterdam, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2. |
[10] | Zhao L Q, Li S Y. Object detection algorithm based on improved YOLOv3[J]. Electronics, 2020, 9(3): 537. doi: 10.3390/electronics9030537 |
[11] | Gai R L, Chen N, Yuan H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model[J]. Neural Comput Appl, 2023, 35(19): 13895−13906. doi: 10.1007/s00521-021-06029-z |
[12] | Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721. |
[13] | Salehi A W, Khan S, Gupta G, et al. A study of CNN and transfer learning in medical imaging: advantages, challenges, future scope[J]. Sustainability, 2023, 15(7): 5930. doi: 10.3390/su15075930 |
[14] | Gao S H, Li Z Y, Han Q, et al. RF-Next: efficient receptive field search for convolutional neural networks[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(3): 2984−3002. doi: 10.1109/TPAMI.2022.3183829 |
[15] | Gao T, Niu Q Q, Zhang J, et al. Global to local: a scale-aware network for remote sensing object detection[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5615614. doi: 10.1109/TGRS.2023.3294241 |
[16] | Zhang J Q, Lei J, Xie W Y, et al. SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5605415. doi: 10.1109/TGRS.2023.3258666 |
[17] | Wang L, Liu X B, Ma J T, et al. Real-time steel surface defect detection with improved multi-scale YOLO-v5[J]. Processes, 2023, 11(5): 1357. doi: 10.3390/pr11051357 |
[18] | Quan Y, Zhang D, Zhang L Y, et al. Centralized feature pyramid for object detection[J]. IEEE Trans Image Process, 2023, 32: 4341−4354. doi: 10.1109/TIP.2023.3297408 |
[19] | Xu S B, Zheng S C, Xu W H, et al. HCF-Net: hierarchical context fusion network for infrared small object detection[Z]. arXiv: 2403.10778, 2024. https://arxiv.org/abs/2403.10778. |
[20] | Li Y X, Li X, Dai Y M, et al. LSKNet: a foundation lightweight backbone for remote sensing[Z]. arXiv: 2403.11735, 2024. https://arxiv.org/abs/2403.11735. |
[21] | Li X, Wang W H, Hu X L, et al. Selective kernel networks[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019: 510–519. https://doi.org/10.1109/CVPR.2019.00060. |
[22] | 梁礼明, 詹涛, 雷坤, 等. 多分辨率融合输入的U型视网膜血管分割算法[J]. 电子与信息学报, 2023, 45(5): 1795−1806. doi: 10.11999/JEIT220470 Liang L M, Zhan T, Lei K, et al. Multi-resolution fusion input U-shaped retinal vessel segmentation algorithm[J]. J Electron Inf Technol, 2023, 45(5): 1795−1806. doi: 10.11999/JEIT220470 |
[23] | Chen Y X, Lin M W, He Z, et al. Consistency-and dependence-guided knowledge distillation for object detection in remote sensing images[J]. Expert Syst Appl, 2023, 229: 120519. doi: 10.1016/j.eswa.2023.120519 |
[24] | Zhao D W, Shao F M, Liu Q, et al. A small object detection method for drone-captured images based on improved YOLOv7[J]. Remote Sens, 2024, 16(6): 1002. doi: 10.3390/rs16061002 |
[25] | Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 3974–3983. https://doi.org/10.1109/CVPR.2018.00418. |
In response to challenges posed by complex background interference, multi-scale variations of targets, and difficulties in extracting small targets in remote sensing images, this paper proposes a novel remote sensing image detection algorithm based on the YOLOv7-tiny model. The algorithm integrates a visual centering mechanism and parallel patch perception to enhance target detection performance. The algorithm introduces three main innovations. Firstly, it introduces an explicit visual centering mechanism that uses a lightweight multi-layer perceptron to establish long-distance dependencies between pixels, focusing on capturing central features of contextual information to enrich the overall semantic information of images, including scene structures and contextual details. Simultaneously, a trainable visual centering mechanism aggregates local area information within layers to capture locally representative feature representations, thereby further improving the extraction performance of target textures. This approach effectively extracts and utilizes the overall semantic information of images, accurately capturing global features of targets to enhance recognition of target textures and shapes during detection. Secondly, the algorithm improves the parallel patch perception module by dynamically adjusting the feature extraction receptive field to adapt to different target scales and capture diverse scale feature information, effectively handling varied backgrounds. In practical applications, targets in remote sensing images often exhibit different scales and complex environmental backgrounds, where traditional methods may struggle to distinguish or ignore these differences. By dynamically adjusting the receptive field, the algorithm flexibly perceives targets of different scales while maintaining high accuracy and low error rates in complex background scenarios. Finally, the algorithm designs a multi-scale feature fusion module to efficiently integrate multi-level and multi-scale feature information, comprehensively capturing diverse representations of targets and further enhancing model inference speed while meeting high-precision detection requirements. This fusion method significantly enhances the algorithm's effectiveness in static image detection tasks. Experimental results on the RSOD dataset demonstrate improvements in accuracy, recall, and mean average precision by 1.5%, 2.4%, and 2.4%, respectively, compared to YOLOv7-tiny. Additionally, generalization validation on the NWPU VHR-10 and DOTA datasets shows commendable results, with average precision mean values increasing by 3.0% and 1.3%, respectively, compared to baseline models. These findings illustrate the algorithm's outstanding performance not only on the RSOD dataset but also on datasets encompassing diverse types and scenes, highlighting its robust generalization capability. Through comparative analysis with different algorithms, the superiority of the proposed algorithm's performance is further underscored.
Remote sensing image detection model integrating visual center mechanism and parallel patch perception
Explicit visual center mechanism
Parallel multi-branch feature extraction module
Large selective kernel module
Multi-scale feature fusion module
Remote sensing target detection results of different algorithms