Target tracking algorithm based on YOLOv3 and ASMS

Lv Chen; Cheng Deqiang; Kou Qiqi; Zhuang Huandong; Li Haixiang

doi:10.12086/oee.2021.200175

Article navigation > Opto-Electronic Engineering > 2021 Vol. 48 > No. 2 > 200175

Next Article Previous Article

Lv C, Cheng D Q, Kou Q Q, et al. Target tracking algorithm based on YOLOv3 and ASMS[J]. Opto-Electron Eng, 2021, 48(2): 200175. doi: 10.12086/oee.2021.200175

Citation:

Lv C, Cheng D Q, Kou Q Q, et al. Target tracking algorithm based on YOLOv3 and ASMS[J]. Opto-Electron Eng, 2021, 48(2): 200175. doi: 10.12086/oee.2021.200175

Target tracking algorithm based on YOLOv3 and ASMS

1.
School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221000, China
2.
School of Computer Science & Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221000, China

Fund Project: National Natural Science Foundation of China (51774281)

More Information

Corresponding author: Lv Chen, E-mail: 286562685@qq.com

Received Date 18 May 2020

Revised Date 24 September 2020

Published Date 15 February 2021

Abstract

Abstract

In order to solve the problem of loss when the target encounters occlusion or the speed is too fast during the automatic tracking process, a target tracking algorithm based on YOLOv3 and ASMS is proposed. Firstly, the target is detected by the YOLOv3 algorithm and the initial target area to be tracked is determined. After that, the ASMS algorithm is used for tracking. The tracking effect of the target is detected and judged in real time. Repositioning is achieved by quadratic fitting positioning and the YOLOv3 algorithm when the target is lost. Finally, in order to further improve the efficiency of the algorithm, the incremental pruning method is used to compress the algorithm model. Compared with the mainstream algorithms, experimental results show that the proposed algorithm can solve the lost problem when the tracking target is occluded, improving the accuracy of target detection and tracking. It also has advantages of low computational complexity, time-consuming, and high real-time performance.
- target tracking /
- target loss /
- you look only once v3 /
- model pruning /
- robust scale-adaptive mean-shift

FullText(HTML)

References

[1]	卢湖川, 李佩霞, 王栋. 目标跟踪算法综述[J]. 模式识别与人工智能, 2018, 31(1): 61-76. Google Scholar Lu H C, Li P X, Wang D. Visual object tracking: a survey[J]. Pattern Recognit Artif Intell, 2018, 31(1): 61-76. Google Scholar
[2]	李玺, 查宇飞, 张天柱, 等. 深度学习的目标跟踪算法综述[J]. 中国图象图形学报, 2019, 24(12): 2057-2080. doi: 10.11834/jig.190372 CrossRef Google Scholar Li X, Zha Y F, Zhang T Z, et al. Survey of visual object tracking algorithms based on deep learning[J]. J Image Graph, 2019, 24(12): 2057–2080. doi: 10.11834/jig.190372 CrossRef Google Scholar
[3]	葛宝义, 左宪章, 胡永江. 视觉目标跟踪方法研究综述[J]. 中国图象图形学报, 2018, 23(8): 1091-1107. Google Scholar Ge B Y, Zuo X Z, Hu Y J. Review of visual object tracking technology[J]. J Image Graph, 2018, 23(8): 1091–1107. Google Scholar
[4]	Sun D Q, Roth S, Black M J. Secrets of optical flow estimation and their principles[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, 2010: 2432-2439. Google Scholar
[5]	Nummiaro K, Koller-Meier E, Van Gool L. An adaptive color-based particle filter[J]. Image Vis Comput, 2003, 21(1): 99-110. doi: 10.1016/S0262-8856(02)00129-4 CrossRef Google Scholar
[6]	Comaniciu D, Ramesh V, Meer P. Real-time tracking of non-rigid objects using mean shift[C]//Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), Hilton Head Island, SC, 2002: 142–149. Google Scholar
[7]	Babenko B, Yang M H, Belongie S. Robust object tracking with online multiple instance learning[J]. IEEE Trans Pattern Anal Mach Intell, 2011, 33(8): 1619–1632. doi: 10.1109/TPAMI.2010.226 CrossRef Google Scholar
[8]	Kalal Z, Mikolajczyk K, Matas J. Tracking-learning-detection[J]. IEEE Trans Pattern Anal Mach Intell, 2012, 34(7): 1409–1422. doi: 10.1109/TPAMI.2011.239 CrossRef Google Scholar
[9]	Avidan S. Support vector tracking[J]. IEEE Trans Pattern Anal Mach Intell, 2004, 26(8): 1064-1072. doi: 10.1109/TPAMI.2004.53 CrossRef Google Scholar
[10]	Vojir T, Noskova J, Matas J. Robust scale-adaptive mean-shift for tracking[C]//Proceedings of the 18th Scandinavian Conference Scandinavian Conference on Image Analysis, Espoo, Finland, 2014: 652–663. Google Scholar
[11]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014: 580–587. Google Scholar
[12]	Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, Santigago, Chile, 2015: 1440–1448. Google Scholar
[13]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031 CrossRef Google Scholar
[14]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016: 779–788. Google Scholar
[15]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference European Conference on Computer Vision, Amsterdam, 2016: 21–37. Google Scholar
[16]	Redmon J, Farhadi A. YOLOv3: an incremental improvement[EB/OL]. [2020-02-10]. https://pjreddie.com/media/files/papers/YOLOv3.pdf. Google Scholar
[17]	Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 2017: 6517–6525. Google Scholar
[18]	Fu C Y, Liu W, Ranga A, et al. DSSD: deconvolutional single shot detector[EB/OL]. [2020-02-10]. https://arxiv.org/pdf/1701.06659.pdf. Google Scholar
[19]	Li Z X, Zhou F Q. FSSD: feature fusion single shot multibox detector[EB/OL]. [2020-02-10]. https://arxiv.org/pdf/1712.00960.pdf. Google Scholar
[20]	Liu Z, Li J G, Shen Z Q, et al. Learning efficient convolutional networks through network slimming[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2755–2763. Google Scholar
[21]	Chen G B, Choi W, Yu X, et al. Learning efficient object detection models with knowledge distillation[EB/OL]. [2020-02-10] http://papers.nips.cc/paper/6676-learning-efficient-object-detection-models-with-knowledge-distillation.pdf. Google Scholar
[22]	Wu J X, Leng C, Wang Y H, et al. Quantized convolutional neural networks for mobile devices[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016: 4820–4828. Google Scholar
[23]	Huang G, Chen D L, Li T H, et al. Multi-scale dense networks for resource efficient image classification[EB/OL]. [2020-02-10] https://arxiv.org/pdf/1703.09844.pdf. Google Scholar
[24]	He M, Zhao H W, Wang G Z, et al. Deep neural network acceleration method based on sparsity[C]//Proceedings of the 15th International Forum International Forum on Digital TV and Wireless Multimedia Communications, Shanghai, China, 2019: 133–145. Google Scholar
[25]	Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Trans Pattern Anal Mach Intell, 2015, 37(3): 583–596. doi: 10.1109/TPAMI.2014.2345390 CrossRef Google Scholar
[26]	Song Y B, Ma C, Wu X H, et al. VITAL: visual tracking via adversarial learning[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018: 8990–8999. Google Scholar
[27]	Fan H, Ling H B. SANet: structure-aware network for visual tracking[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, Hl, 2017: 2217–2224. Google Scholar

Overview

Overview

Overview: Tracking mobile objects has always been a challenging task and a hot research direction. Now, with the continuous improvement of hardware facilities and the rapid development of artificial intelligence technology, the technology of tracking mobile objects becomes more and more important. In order to solve the problem of loss when the target encounters occlusion or the speed is too fast during the automatic tracking process, this paper combines traditional algorithms with machine learning algorithms. As well as, a target tracking algorithm based on YOLOv3 and ASMS is proposed. Then, by pruning YOLOv3 and combining it with ASMS, the algorithm this paper proposed runs faster. The method of this paper first performs foreground detection through YOLOv3 to find the initial target area for tracking, which eliminats the need to manually circle the region of interest, and then performs tracking based on the ASMS algorithm. The algorithm based on YOLOv3 and ASMS detects and judges the tracking effect of the target in real time. When the tracking frame of ASMS is significantly offset from the detection target or the tracking frame is too large and contains too much background information, the tracking accuracy will decrease. If the target is blocked or moves too fast, it will be lost. For these two cases, YOLOv3 and quadratic fitting positioning are used to relocate to improve the accuracy of the algorithm and solve the problem of target loss. In order to further improve the efficiency of the algorithm, the method of incremental pruning is applied to compress YOLOv3. This article fine-tunes the network to reduce the reduction in algorithm accuracy caused by channel pruning and to prevent excessive pruning from causing network performance degradation. When performing model compression, firstly a scaling factor regular term is introduced for the sparse training of the convolutional layer channel of the YOLOv3 network. Then the global threshold is used to remove the components that are not important to the model reasoning, that is, the less scoring parts. An incremental pruning strategy is further used to prevent network degradation caused by excessive pruning. Finally, this paper fine-tunes the pruning model to compensate for potential temporary performance degradation. Compared with YOLOv3 in COCO database, the experimental results show that the speed of the best pruned algorithm is increased by 49.9%, the model parameters are reduced by 92.0%, and the body weight is reduced by 91.9%. After combining the pruned YOLOv3 with the ASMS algorithm, the experimental results show that the running speed of the proposed joint algorithm is 32.5% faster than the unpruned joint algorithm when the target has occlusion, and the accuracy is much better than that of ASMS. The proposed algorithm can solve the lost problem when the tracking target is occluded, improving the accuracy of target detection and tracking. Moreover, it has advantages of low computational complexity, time-consuming, and high real-time performance.