Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception

Liang Liming; Chen Kangquan; Wang Chengbin; Feng Yao; Long Pengwei

doi:10.12086/oee.2024.240099

Article navigation > Opto-Electronic Engineering > 2024 Vol. 51 > No. 7 > 240099

Next Article Previous Article

Liang L M, Chen K Q, Wang C B, et al. Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception[J]. Opto-Electron Eng, 2024, 51(7): 240099. doi: 10.12086/oee.2024.240099

Citation:

Liang L M, Chen K Q, Wang C B, et al. Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception[J]. Opto-Electron Eng, 2024, 51(7): 240099. doi: 10.12086/oee.2024.240099

Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception

School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China

Fund Project: Project supported by National Natural Science Foundation of China (51365017, 61463018), Natural Science Foundation of Jiangxi Province (20192BAB205084), Jiangxi Provincial Department of Education Science, and Technology Research Youth Project (GJJ2200848)

More Information

^*Corresponding author: 1136344152@qq.com

Received Date 01 May 2024

Revised Date 10 July 2024

Accepted Date 10 July 2024

Published Date 20 August 2024

Abstract

Abstract

To address the challenges of complex background interference, multi-scale differences in targets, and the difficulty in extracting small targets from remote sensing images, this paper proposes a remote sensing image detection algorithm based on the YOLOv7-tiny model that integrates the visual center mechanism and parallel patch perception. Firstly, the algorithm introduces an explicit visual center mechanism to establish long-distance dependencies between pixels, enriching the overall semantic information of the image and improving the extraction performance of target textures. Secondly, it improves the parallel patch perception module by adjusting the feature extraction receptive fields to adapt to different target scales. Thirdly, a multi-scale feature fusion module is designed to efficiently fuse multi-layer features, thereby improving the model's inference speed. Experimental results on the RSOD dataset show that the proposed algorithm achieves improvements over YOLOv7-tiny in terms of precision, recall, and mean average precision by 1.5%, 2.4%, and 2.4%, respectively. Additionally, validation on the NWPU VHR-10 and DOTA datasets confirms the strong generalization performance of the proposed algorithm. Comparative analysis with other algorithms further demonstrates the superior performance of the proposed approach.
- remote sensing images /
- object detection /
- YOLOv7-tiny /
- explicit visual center mechanism /
- parallel patch perception

FullText(HTML)

References

[1]	马梁, 苟于涛, 雷涛, 等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363 CrossRef Google Scholar Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363 CrossRef Google Scholar
[2]	袁金豪, 张南峰, 阮洁珊, 等. 基于改进YOLOX算法的X射线图像违禁品检测方法[J]. 激光技术, 2023, 47(4): 547−552. doi: 10.7510/jgjs.issn.1001-3806.2023.04.016 CrossRef Google Scholar Yuan J H, Zhang N F, Ruan J S, et al. Detection of prohibited items in X-ray images based on modified YOLOX algorithm[J]. Laser Technol, 2023, 47(4): 547−552. doi: 10.7510/jgjs.issn.1001-3806.2023.04.016 CrossRef Google Scholar
[3]	Ming Q, Miao L J, Zhou Z Q, et al. CFC-Net: a critical feature capturing network for arbitrary-oriented object detection in remote-sensing images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5605814. doi: 10.1109/TGRS.2021.3095186 CrossRef Google Scholar
[4]	Cong R M, Zhang Y M, Fang L Y, et al. RRNet: relational reasoning network with parallel multiscale attention for salient object detection in optical remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5613311. doi: 10.1109/TGRS.2021.3123984 CrossRef Google Scholar
[5]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014: 580–587. https://doi.org/10.1109/CVPR.2014.81. Google Scholar
[6]	Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169. Google Scholar
[7]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031 CrossRef Google Scholar
[8]	He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 2961–2969. https://doi.org/10.1109/ICCV.2017.322. Google Scholar
[9]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference, Amsterdam, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2. Google Scholar
[10]	Zhao L Q, Li S Y. Object detection algorithm based on improved YOLOv3[J]. Electronics, 2020, 9(3): 537. doi: 10.3390/electronics9030537 CrossRef Google Scholar
[11]	Gai R L, Chen N, Yuan H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model[J]. Neural Comput Appl, 2023, 35(19): 13895−13906. doi: 10.1007/s00521-021-06029-z CrossRef Google Scholar
[12]	Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721. Google Scholar
[13]	Salehi A W, Khan S, Gupta G, et al. A study of CNN and transfer learning in medical imaging: advantages, challenges, future scope[J]. Sustainability, 2023, 15(7): 5930. doi: 10.3390/su15075930 CrossRef Google Scholar
[14]	Gao S H, Li Z Y, Han Q, et al. RF-Next: efficient receptive field search for convolutional neural networks[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(3): 2984−3002. doi: 10.1109/TPAMI.2022.3183829 CrossRef Google Scholar
[15]	Gao T, Niu Q Q, Zhang J, et al. Global to local: a scale-aware network for remote sensing object detection[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5615614. doi: 10.1109/TGRS.2023.3294241 CrossRef Google Scholar
[16]	Zhang J Q, Lei J, Xie W Y, et al. SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5605415. doi: 10.1109/TGRS.2023.3258666 CrossRef Google Scholar
[17]	Wang L, Liu X B, Ma J T, et al. Real-time steel surface defect detection with improved multi-scale YOLO-v5[J]. Processes, 2023, 11(5): 1357. doi: 10.3390/pr11051357 CrossRef Google Scholar
[18]	Quan Y, Zhang D, Zhang L Y, et al. Centralized feature pyramid for object detection[J]. IEEE Trans Image Process, 2023, 32: 4341−4354. doi: 10.1109/TIP.2023.3297408 CrossRef Google Scholar
[19]	Xu S B, Zheng S C, Xu W H, et al. HCF-Net: hierarchical context fusion network for infrared small object detection[Z]. arXiv: 2403.10778, 2024. https://arxiv.org/abs/2403.10778. Google Scholar
[20]	Li Y X, Li X, Dai Y M, et al. LSKNet: a foundation lightweight backbone for remote sensing[Z]. arXiv: 2403.11735, 2024. https://arxiv.org/abs/2403.11735. Google Scholar
[21]	Li X, Wang W H, Hu X L, et al. Selective kernel networks[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019: 510–519. https://doi.org/10.1109/CVPR.2019.00060. Google Scholar
[22]	梁礼明, 詹涛, 雷坤, 等. 多分辨率融合输入的U型视网膜血管分割算法[J]. 电子与信息学报, 2023, 45(5): 1795−1806. doi: 10.11999/JEIT220470 CrossRef Google Scholar Liang L M, Zhan T, Lei K, et al. Multi-resolution fusion input U-shaped retinal vessel segmentation algorithm[J]. J Electron Inf Technol, 2023, 45(5): 1795−1806. doi: 10.11999/JEIT220470 CrossRef Google Scholar
[23]	Chen Y X, Lin M W, He Z, et al. Consistency-and dependence-guided knowledge distillation for object detection in remote sensing images[J]. Expert Syst Appl, 2023, 229: 120519. doi: 10.1016/j.eswa.2023.120519 CrossRef Google Scholar
[24]	Zhao D W, Shao F M, Liu Q, et al. A small object detection method for drone-captured images based on improved YOLOv7[J]. Remote Sens, 2024, 16(6): 1002. doi: 10.3390/rs16061002 CrossRef Google Scholar
[25]	Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 3974–3983. https://doi.org/10.1109/CVPR.2018.00418. Google Scholar

Overview

Overview

In response to challenges posed by complex background interference, multi-scale variations of targets, and difficulties in extracting small targets in remote sensing images, this paper proposes a novel remote sensing image detection algorithm based on the YOLOv7-tiny model. The algorithm integrates a visual centering mechanism and parallel patch perception to enhance target detection performance. The algorithm introduces three main innovations. Firstly, it introduces an explicit visual centering mechanism that uses a lightweight multi-layer perceptron to establish long-distance dependencies between pixels, focusing on capturing central features of contextual information to enrich the overall semantic information of images, including scene structures and contextual details. Simultaneously, a trainable visual centering mechanism aggregates local area information within layers to capture locally representative feature representations, thereby further improving the extraction performance of target textures. This approach effectively extracts and utilizes the overall semantic information of images, accurately capturing global features of targets to enhance recognition of target textures and shapes during detection. Secondly, the algorithm improves the parallel patch perception module by dynamically adjusting the feature extraction receptive field to adapt to different target scales and capture diverse scale feature information, effectively handling varied backgrounds. In practical applications, targets in remote sensing images often exhibit different scales and complex environmental backgrounds, where traditional methods may struggle to distinguish or ignore these differences. By dynamically adjusting the receptive field, the algorithm flexibly perceives targets of different scales while maintaining high accuracy and low error rates in complex background scenarios. Finally, the algorithm designs a multi-scale feature fusion module to efficiently integrate multi-level and multi-scale feature information, comprehensively capturing diverse representations of targets and further enhancing model inference speed while meeting high-precision detection requirements. This fusion method significantly enhances the algorithm's effectiveness in static image detection tasks. Experimental results on the RSOD dataset demonstrate improvements in accuracy, recall, and mean average precision by 1.5%, 2.4%, and 2.4%, respectively, compared to YOLOv7-tiny. Additionally, generalization validation on the NWPU VHR-10 and DOTA datasets shows commendable results, with average precision mean values increasing by 3.0% and 1.3%, respectively, compared to baseline models. These findings illustrate the algorithm's outstanding performance not only on the RSOD dataset but also on datasets encompassing diverse types and scenes, highlighting its robust generalization capability. Through comparative analysis with different algorithms, the superiority of the proposed algorithm's performance is further underscored.