Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective

Zhang Runmei; Xiao Yufei; Jia Zhennan; Chen Zhong; Chen Zihua; Yuan Bin; Cao Weiwei; Song Weiwei

doi:10.12086/oee.2024.240051

Article navigation > Opto-Electronic Engineering > 2024 Vol. 51 > No. 5 > 240051

Next Article Previous Article

Zhang R M, Xiao Y F, Jia Z N, et al. Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective[J]. Opto-Electron Eng, 2024, 51(5): 240051. doi: 10.12086/oee.2024.240051

Citation:

Zhang R M, Xiao Y F, Jia Z N, et al. Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective[J]. Opto-Electron Eng, 2024, 51(5): 240051. doi: 10.12086/oee.2024.240051

Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective

1.
School of Mechanical and Electrical Engineering, Anhui Jianzhu University, Hefei, Anhui 230601, China
2.
Key Laboratory of Intelligent Manufacturing of Construction Machinery, Hefei, Anhui 230601, China
3.
Anhui Simulation Design and Modern Manufacturing Engineering Technology Research Center, Huangshan, Anhui 242700, China
4.
Key Laboratory of Civil Aviation Flight Technology and Flight Safety, Guanghan, Sichuan 618300, China

Fund Project: Project supported by Open Fund of Anhui Simulation Design and Modern Manufacturing Engineering Technology Research Centre (SGCZXZD2101), and Construction of UAV Safety Knowledge Base Based on Knowledge Graph (FZ2021KF10)

More Information

^*Corresponding author: swwahjzu11@163.com

Received Date 06 March 2024

Revised Date 21 April 2024

Accepted Date 24 April 2024

Published Date 25 May 2024

Abstract

Abstract

To address the challenges faced by drones during UAV (unmanned aerial vehicle) photography in adverse conditions, such as low image recognition, obstruction by obstacles, and significant feature loss, a novel algorithm named SSG-YOLOv7 was proposed to enhance object detection from the perspective of drones in complex environments. Firstly, 12803 images were augmented from the VisDrone2019 dataset, and 1320 images were augmented from the RSOD dataset to simulate five different environments. Subsequently, anchor box sizes suitable for the datasets were clustered. The 3D non-local attention mechanism SimAM was integrated into the backbone network and feature extraction module to enhance the model's learning capabilities. Furthermore, the feature extraction module SPPCSPC was restructured to integrate information extracted from channels with different pool sizes and introduce the lightweight convolution module GhostConv, thereby improving the precision of dense multi-scale object detection without increasing the model's parameter count. Finally, Soft NMS was employed to optimize the confidence of anchor boxes, reducing false positives and missed detections. Experimental results demonstrate that SSG-YOLOv7 exhibits superior detection performance in complex environments, with performance metrics VisDrone_mAP@0.5 and RSOD_mAP@0.5 showing improvements of 10.45% and 2.67%, respectively, compared to YOLOv7.
- UAV /
- complex environment /
- YOLOv7 /
- simAM attention mechanism /
- SPPCSPC /
- data enhancement

FullText(HTML)

References

[1]	陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 CrossRef Google Scholar Chen X, Peng D L, Gu Y. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electron Eng, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 CrossRef Google Scholar
[2]	阳珊, 王建, 胡莉, 等. 改进RetinaNet的遮挡目标检测算法研究[J]. 计算机工程与应用, 2022, 58(11): 209−214. doi: 10.3778/j.issn.1002-8331.2107-0277 CrossRef Google Scholar Yang S, Wang J, Hu L, et al. Research on occluded object detection by improved RetinaNet[J]. Comput Eng Appl, 2022, 58(11): 209−214. doi: 10.3778/j.issn.1002-8331.2107-0277 CrossRef Google Scholar
[3]	Zhan W, Sun C F, Wang M C, et al. An improved Yolov5 real-time detection method for small objects captured by UAV[J]. Soft Comput, 2022, 26(6): 361−373. doi: 10.1007/s00500-021-06407-8 CrossRef Google Scholar
[4]	Liu W, Quijano K, Crawford M M. YOLOv5-tassel: detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning[J]. IEEE J Sel Top Appl Earth Obs Remote Sens, 2022, 15: 8085−8094. doi: 10.1109/JSTARS.2022.3206399 CrossRef Google Scholar
[5]	Purkait P, Zhao C, Zach C. SPP-Net: deep absolute pose regression with synthetic views[Z]. arXiv: 1712.03452, 2017. https://doi.org/10.48550/arXiv.1712.03452. Google Scholar
[6]	Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169. Google Scholar
[7]	Uijlings J R R, van de Sande K E A, Gevers T, et al. Selective search for object recognition[J]. Int J Comput Vis, 2013, 104(2): 154−171. doi: 10.1007/s11263-013-0620-5 CrossRef Google Scholar
[8]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031 CrossRef Google Scholar
[9]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016. https://doi.org/10.1109/CVPR.2016.91. Google Scholar
[10]	Yin R H, Zhao W, Fan X D, et al. AF-SSD: an accurate and fast single shot detector for high spatial remote sensing imagery[J]. Sensors, 2020, 20(22): 6530. doi: 10.3390/s20226530 CrossRef Google Scholar
[11]	齐向明, 柴蕊, 高一萌. 重构SPPCSPC与优化下采样的小目标检测算法[J]. 计算机工程与应用, 2023, 59(20): 158−166. doi: 10.3778/j.issn.1002-8331.2305-0004 CrossRef Google Scholar Qi X M, Chai R, Gao Y M. Algorithm of reconstructed SPPCSPC and optimized downsampling for small object detection[J]. Comput Eng Appl, 2023, 59(20): 158−166. doi: 10.3778/j.issn.1002-8331.2305-0004 CrossRef Google Scholar
[12]	Shang J C, Wang J S, Liu S B, et al. Small target detection algorithm for UAV aerial photography based on improved YOLOv5s[J]. Electronics, 2023, 12(11): 2434. doi: 10.3390/electronics12112434 CrossRef Google Scholar
[13]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721. Google Scholar
[14]	Tang F, Yang F, Tian X Q. Long-distance person detection based on YOLOv7[J]. Electronics, 2023, 12(6): 1502. doi: 10.3390/electronics12061502 CrossRef Google Scholar
[15]	Huang T Y, Cheng M, Yang Y L, et al. Tiny object detection based on YOLOv5[C]//Proceedings of the 2022 5th International Conference on Image and Graphics Processing, 2022: 45–50. https://doi.org/10.1145/3512388.3512395. Google Scholar
[16]	Ismkhan H. I-k-means-+: an iterative clustering algorithm based on an enhanced version of the k-means[J]. Pattern Recognit, 2018, 79: 402−413. doi: 10.1016/j.patcog.2018.02.015 CrossRef Google Scholar
[17]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000–6010. https://doi.org/10.5555/3295222.3295349. Google Scholar
[18]	Yang L X, Zhang R Y, Li L D, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 11863–11874. Google Scholar
[19]	Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944. https://doi.org/10.1109/CVPR.2017.106. Google Scholar
[20]	Han K, Wang Y H, Tian Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1577–1586. https://doi.org/10.1109/CVPR42600.2020.00165. Google Scholar
[21]	Bodla N, Singh B, Chellappa R, et al. Soft-NMS - improving object detection with one line of code[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 5562–5570. https://doi.org/10.1109/ICCV.2017.593. Google Scholar
[22]	Du D W, Zhu P F, Wen L Y, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop, 2019: 213–226. https://doi.org/10.1109/ICCVW.2019.00030. Google Scholar
[23]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21-37. https://doi.org/10.1007/978-3-319-46448-0_2. Google Scholar

Overview

Overview

Using low-cost unmanned aerial vehicle (UAV) photography technology combined with deep learning can create significant value in various fields. Targets captured from a UAV perspective often exhibit drastic scale variations, uneven distribution, and susceptibility to obstruction by obstacles. Moreover, UAVs typically fly at low altitudes and high speeds during the capture process, which can result in low-resolution aerial images affected by weather conditions or the drone's own vibrations. Maintaining high detection accuracy in such complex environments is a crucial challenge in UAV-based target detection tasks. Therefore, this paper proposes a new target detection algorithm, SSG-YOLOv7, based on YOLOv7. Firstly, the algorithm utilizes the K-means++ clustering algorithm to generate four different-scale anchor boxes suitable for the target dataset, effectively addressing the issue of large-scale variations in targets from the UAV perspective. Next, by introducing the SimAM attention mechanism into the neck network and feature extraction module, the model's detection accuracy is improved without increasing the model's parameter count. Subsequently, the pooling layers at different scales of the feature extraction module are fused to enable the model to learn richer target feature information in complex environments. Additionally, GhostConv is used to replace traditional convolutional modules to reduce the parameter count of the feature extraction module. Finally, Soft NMS is employed to reduce the false detection and missed detection rates of small-scale targets during the detection process, thereby enhancing target detection effectiveness from the UAV perspective. In the experimental process, the original VisDrone dataset and RSOD dataset are simulated under five complex environments using transformation functions from the Imgaug library. SSG-YOLOv7 is validated against the original algorithm. Compared to the original algorithm, the proposed algorithm improves the average precision (mAP@0.5) of the model by 10.45% in the VisDrone dataset and by 2.67% in the RSOD dataset, while reducing the model's parameter count by 24.2%. This effectively demonstrates that SSG-YOLOv7 is better suited for target detection tasks in complex environments from the UAV perspective. Additionally, the experiment compares the detection accuracy of YOLOv7 and SSG-YOLOv7 before and after data augmentation on both datasets. In the VisDrone dataset, YOLOv7 improves by 4.13%, while SSG-YOLOv7 improves by 8.71%. In the RSOD dataset, YOLOv7 improves by 3.59%, while SSG-YOLOv7 improves by 4.45%. This effectively proves that SSG-YOLOv7 can learn more target features from samples in complex environments, accurately locate the targets, and is suitable for multi-target detection tasks in complex environments from the UAV perspective.