Liang L M, Chen K Q, Wang C B, et al. Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception[J]. Opto-Electron Eng, 2024, 51(7): 240099. doi: 10.12086/oee.2024.240099
Citation: Liang L M, Chen K Q, Wang C B, et al. Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception[J]. Opto-Electron Eng, 2024, 51(7): 240099. doi: 10.12086/oee.2024.240099

Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception

    Fund Project: Project supported by National Natural Science Foundation of China (51365017, 61463018), Natural Science Foundation of Jiangxi Province (20192BAB205084), Jiangxi Provincial Department of Education Science, and Technology Research Youth Project (GJJ2200848)
More Information
  • To address the challenges of complex background interference, multi-scale differences in targets, and the difficulty in extracting small targets from remote sensing images, this paper proposes a remote sensing image detection algorithm based on the YOLOv7-tiny model that integrates the visual center mechanism and parallel patch perception. Firstly, the algorithm introduces an explicit visual center mechanism to establish long-distance dependencies between pixels, enriching the overall semantic information of the image and improving the extraction performance of target textures. Secondly, it improves the parallel patch perception module by adjusting the feature extraction receptive fields to adapt to different target scales. Thirdly, a multi-scale feature fusion module is designed to efficiently fuse multi-layer features, thereby improving the model's inference speed. Experimental results on the RSOD dataset show that the proposed algorithm achieves improvements over YOLOv7-tiny in terms of precision, recall, and mean average precision by 1.5%, 2.4%, and 2.4%, respectively. Additionally, validation on the NWPU VHR-10 and DOTA datasets confirms the strong generalization performance of the proposed algorithm. Comparative analysis with other algorithms further demonstrates the superior performance of the proposed approach.
  • 加载中
  • [1] 马梁, 苟于涛, 雷涛, 等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363

    CrossRef Google Scholar

    Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363

    CrossRef Google Scholar

    [2] 袁金豪, 张南峰, 阮洁珊, 等. 基于改进YOLOX算法的X射线图像违禁品检测方法[J]. 激光技术, 2023, 47(4): 547−552. doi: 10.7510/jgjs.issn.1001-3806.2023.04.016

    CrossRef Google Scholar

    Yuan J H, Zhang N F, Ruan J S, et al. Detection of prohibited items in X-ray images based on modified YOLOX algorithm[J]. Laser Technol, 2023, 47(4): 547−552. doi: 10.7510/jgjs.issn.1001-3806.2023.04.016

    CrossRef Google Scholar

    [3] Ming Q, Miao L J, Zhou Z Q, et al. CFC-Net: a critical feature capturing network for arbitrary-oriented object detection in remote-sensing images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5605814. doi: 10.1109/TGRS.2021.3095186

    CrossRef Google Scholar

    [4] Cong R M, Zhang Y M, Fang L Y, et al. RRNet: relational reasoning network with parallel multiscale attention for salient object detection in optical remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5613311. doi: 10.1109/TGRS.2021.3123984

    CrossRef Google Scholar

    [5] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014: 580–587. https://doi.org/10.1109/CVPR.2014.81.

    Google Scholar

    [6] Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169.

    Google Scholar

    [7] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031

    CrossRef Google Scholar

    [8] He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 2961–2969. https://doi.org/10.1109/ICCV.2017.322.

    Google Scholar

    [9] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference, Amsterdam, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.

    Google Scholar

    [10] Zhao L Q, Li S Y. Object detection algorithm based on improved YOLOv3[J]. Electronics, 2020, 9(3): 537. doi: 10.3390/electronics9030537

    CrossRef Google Scholar

    [11] Gai R L, Chen N, Yuan H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model[J]. Neural Comput Appl, 2023, 35(19): 13895−13906. doi: 10.1007/s00521-021-06029-z

    CrossRef Google Scholar

    [12] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721.

    Google Scholar

    [13] Salehi A W, Khan S, Gupta G, et al. A study of CNN and transfer learning in medical imaging: advantages, challenges, future scope[J]. Sustainability, 2023, 15(7): 5930. doi: 10.3390/su15075930

    CrossRef Google Scholar

    [14] Gao S H, Li Z Y, Han Q, et al. RF-Next: efficient receptive field search for convolutional neural networks[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(3): 2984−3002. doi: 10.1109/TPAMI.2022.3183829

    CrossRef Google Scholar

    [15] Gao T, Niu Q Q, Zhang J, et al. Global to local: a scale-aware network for remote sensing object detection[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5615614. doi: 10.1109/TGRS.2023.3294241

    CrossRef Google Scholar

    [16] Zhang J Q, Lei J, Xie W Y, et al. SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5605415. doi: 10.1109/TGRS.2023.3258666

    CrossRef Google Scholar

    [17] Wang L, Liu X B, Ma J T, et al. Real-time steel surface defect detection with improved multi-scale YOLO-v5[J]. Processes, 2023, 11(5): 1357. doi: 10.3390/pr11051357

    CrossRef Google Scholar

    [18] Quan Y, Zhang D, Zhang L Y, et al. Centralized feature pyramid for object detection[J]. IEEE Trans Image Process, 2023, 32: 4341−4354. doi: 10.1109/TIP.2023.3297408

    CrossRef Google Scholar

    [19] Xu S B, Zheng S C, Xu W H, et al. HCF-Net: hierarchical context fusion network for infrared small object detection[Z]. arXiv: 2403.10778, 2024. https://arxiv.org/abs/2403.10778.

    Google Scholar

    [20] Li Y X, Li X, Dai Y M, et al. LSKNet: a foundation lightweight backbone for remote sensing[Z]. arXiv: 2403.11735, 2024. https://arxiv.org/abs/2403.11735.

    Google Scholar

    [21] Li X, Wang W H, Hu X L, et al. Selective kernel networks[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019: 510–519. https://doi.org/10.1109/CVPR.2019.00060.

    Google Scholar

    [22] 梁礼明, 詹涛, 雷坤, 等. 多分辨率融合输入的U型视网膜血管分割算法[J]. 电子与信息学报, 2023, 45(5): 1795−1806. doi: 10.11999/JEIT220470

    CrossRef Google Scholar

    Liang L M, Zhan T, Lei K, et al. Multi-resolution fusion input U-shaped retinal vessel segmentation algorithm[J]. J Electron Inf Technol, 2023, 45(5): 1795−1806. doi: 10.11999/JEIT220470

    CrossRef Google Scholar

    [23] Chen Y X, Lin M W, He Z, et al. Consistency-and dependence-guided knowledge distillation for object detection in remote sensing images[J]. Expert Syst Appl, 2023, 229: 120519. doi: 10.1016/j.eswa.2023.120519

    CrossRef Google Scholar

    [24] Zhao D W, Shao F M, Liu Q, et al. A small object detection method for drone-captured images based on improved YOLOv7[J]. Remote Sens, 2024, 16(6): 1002. doi: 10.3390/rs16061002

    CrossRef Google Scholar

    [25] Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 3974–3983. https://doi.org/10.1109/CVPR.2018.00418.

    Google Scholar

  • In response to challenges posed by complex background interference, multi-scale variations of targets, and difficulties in extracting small targets in remote sensing images, this paper proposes a novel remote sensing image detection algorithm based on the YOLOv7-tiny model. The algorithm integrates a visual centering mechanism and parallel patch perception to enhance target detection performance. The algorithm introduces three main innovations. Firstly, it introduces an explicit visual centering mechanism that uses a lightweight multi-layer perceptron to establish long-distance dependencies between pixels, focusing on capturing central features of contextual information to enrich the overall semantic information of images, including scene structures and contextual details. Simultaneously, a trainable visual centering mechanism aggregates local area information within layers to capture locally representative feature representations, thereby further improving the extraction performance of target textures. This approach effectively extracts and utilizes the overall semantic information of images, accurately capturing global features of targets to enhance recognition of target textures and shapes during detection. Secondly, the algorithm improves the parallel patch perception module by dynamically adjusting the feature extraction receptive field to adapt to different target scales and capture diverse scale feature information, effectively handling varied backgrounds. In practical applications, targets in remote sensing images often exhibit different scales and complex environmental backgrounds, where traditional methods may struggle to distinguish or ignore these differences. By dynamically adjusting the receptive field, the algorithm flexibly perceives targets of different scales while maintaining high accuracy and low error rates in complex background scenarios. Finally, the algorithm designs a multi-scale feature fusion module to efficiently integrate multi-level and multi-scale feature information, comprehensively capturing diverse representations of targets and further enhancing model inference speed while meeting high-precision detection requirements. This fusion method significantly enhances the algorithm's effectiveness in static image detection tasks. Experimental results on the RSOD dataset demonstrate improvements in accuracy, recall, and mean average precision by 1.5%, 2.4%, and 2.4%, respectively, compared to YOLOv7-tiny. Additionally, generalization validation on the NWPU VHR-10 and DOTA datasets shows commendable results, with average precision mean values increasing by 3.0% and 1.3%, respectively, compared to baseline models. These findings illustrate the algorithm's outstanding performance not only on the RSOD dataset but also on datasets encompassing diverse types and scenes, highlighting its robust generalization capability. Through comparative analysis with different algorithms, the superiority of the proposed algorithm's performance is further underscored.

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(7)

Tables(8)

Article Metrics

Article views() PDF downloads() Cited by()

Access History
Article Contents

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint