Citation: | Zhang R M, Xiao Y F, Jia Z N, et al. Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective[J]. Opto-Electron Eng, 2024, 51(5): 240051. doi: 10.12086/oee.2024.240051 |
[1] | 陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 Chen X, Peng D L, Gu Y. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electron Eng, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 |
[2] | 阳珊, 王建, 胡莉, 等. 改进RetinaNet的遮挡目标检测算法研究[J]. 计算机工程与应用, 2022, 58(11): 209−214. doi: 10.3778/j.issn.1002-8331.2107-0277 Yang S, Wang J, Hu L, et al. Research on occluded object detection by improved RetinaNet[J]. Comput Eng Appl, 2022, 58(11): 209−214. doi: 10.3778/j.issn.1002-8331.2107-0277 |
[3] | Zhan W, Sun C F, Wang M C, et al. An improved Yolov5 real-time detection method for small objects captured by UAV[J]. Soft Comput, 2022, 26(6): 361−373. doi: 10.1007/s00500-021-06407-8 |
[4] | Liu W, Quijano K, Crawford M M. YOLOv5-tassel: detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning[J]. IEEE J Sel Top Appl Earth Obs Remote Sens, 2022, 15: 8085−8094. doi: 10.1109/JSTARS.2022.3206399 |
[5] | Purkait P, Zhao C, Zach C. SPP-Net: deep absolute pose regression with synthetic views[Z]. arXiv: 1712.03452, 2017. https://doi.org/10.48550/arXiv.1712.03452. |
[6] | Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169. |
[7] | Uijlings J R R, van de Sande K E A, Gevers T, et al. Selective search for object recognition[J]. Int J Comput Vis, 2013, 104(2): 154−171. doi: 10.1007/s11263-013-0620-5 |
[8] | Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031 |
[9] | Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016. https://doi.org/10.1109/CVPR.2016.91. |
[10] | Yin R H, Zhao W, Fan X D, et al. AF-SSD: an accurate and fast single shot detector for high spatial remote sensing imagery[J]. Sensors, 2020, 20(22): 6530. doi: 10.3390/s20226530 |
[11] | 齐向明, 柴蕊, 高一萌. 重构SPPCSPC与优化下采样的小目标检测算法[J]. 计算机工程与应用, 2023, 59(20): 158−166. doi: 10.3778/j.issn.1002-8331.2305-0004 Qi X M, Chai R, Gao Y M. Algorithm of reconstructed SPPCSPC and optimized downsampling for small object detection[J]. Comput Eng Appl, 2023, 59(20): 158−166. doi: 10.3778/j.issn.1002-8331.2305-0004 |
[12] | Shang J C, Wang J S, Liu S B, et al. Small target detection algorithm for UAV aerial photography based on improved YOLOv5s[J]. Electronics, 2023, 12(11): 2434. doi: 10.3390/electronics12112434 |
[13] | WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721. |
[14] | Tang F, Yang F, Tian X Q. Long-distance person detection based on YOLOv7[J]. Electronics, 2023, 12(6): 1502. doi: 10.3390/electronics12061502 |
[15] | Huang T Y, Cheng M, Yang Y L, et al. Tiny object detection based on YOLOv5[C]//Proceedings of the 2022 5th International Conference on Image and Graphics Processing, 2022: 45–50. https://doi.org/10.1145/3512388.3512395. |
[16] | Ismkhan H. I-k-means-+: an iterative clustering algorithm based on an enhanced version of the k-means[J]. Pattern Recognit, 2018, 79: 402−413. doi: 10.1016/j.patcog.2018.02.015 |
[17] | Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000–6010. https://doi.org/10.5555/3295222.3295349. |
[18] | Yang L X, Zhang R Y, Li L D, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 11863–11874. |
[19] | Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944. https://doi.org/10.1109/CVPR.2017.106. |
[20] | Han K, Wang Y H, Tian Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1577–1586. https://doi.org/10.1109/CVPR42600.2020.00165. |
[21] | Bodla N, Singh B, Chellappa R, et al. Soft-NMS - improving object detection with one line of code[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 5562–5570. https://doi.org/10.1109/ICCV.2017.593. |
[22] | Du D W, Zhu P F, Wen L Y, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop, 2019: 213–226. https://doi.org/10.1109/ICCVW.2019.00030. |
[23] | Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21-37. https://doi.org/10.1007/978-3-319-46448-0_2. |
Using low-cost unmanned aerial vehicle (UAV) photography technology combined with deep learning can create significant value in various fields. Targets captured from a UAV perspective often exhibit drastic scale variations, uneven distribution, and susceptibility to obstruction by obstacles. Moreover, UAVs typically fly at low altitudes and high speeds during the capture process, which can result in low-resolution aerial images affected by weather conditions or the drone's own vibrations. Maintaining high detection accuracy in such complex environments is a crucial challenge in UAV-based target detection tasks. Therefore, this paper proposes a new target detection algorithm, SSG-YOLOv7, based on YOLOv7. Firstly, the algorithm utilizes the K-means++ clustering algorithm to generate four different-scale anchor boxes suitable for the target dataset, effectively addressing the issue of large-scale variations in targets from the UAV perspective. Next, by introducing the SimAM attention mechanism into the neck network and feature extraction module, the model's detection accuracy is improved without increasing the model's parameter count. Subsequently, the pooling layers at different scales of the feature extraction module are fused to enable the model to learn richer target feature information in complex environments. Additionally, GhostConv is used to replace traditional convolutional modules to reduce the parameter count of the feature extraction module. Finally, Soft NMS is employed to reduce the false detection and missed detection rates of small-scale targets during the detection process, thereby enhancing target detection effectiveness from the UAV perspective. In the experimental process, the original VisDrone dataset and RSOD dataset are simulated under five complex environments using transformation functions from the Imgaug library. SSG-YOLOv7 is validated against the original algorithm. Compared to the original algorithm, the proposed algorithm improves the average precision (mAP@0.5) of the model by 10.45% in the VisDrone dataset and by 2.67% in the RSOD dataset, while reducing the model's parameter count by 24.2%. This effectively demonstrates that SSG-YOLOv7 is better suited for target detection tasks in complex environments from the UAV perspective. Additionally, the experiment compares the detection accuracy of YOLOv7 and SSG-YOLOv7 before and after data augmentation on both datasets. In the VisDrone dataset, YOLOv7 improves by 4.13%, while SSG-YOLOv7 improves by 8.71%. In the RSOD dataset, YOLOv7 improves by 3.59%, while SSG-YOLOv7 improves by 4.45%. This effectively proves that SSG-YOLOv7 can learn more target features from samples in complex environments, accurately locate the targets, and is suitable for multi-target detection tasks in complex environments from the UAV perspective.
SimAM attention module
SSG-YOLOv7 overall structure
GhostConv structure
Comparison of (a) NMS and (b) soft NMS detection effect sample chart
Data augmentation comparison of two kinds of datasets
Visual comparison of YOLOv7 and SSG-YOLOv7 detection effect