Citation: |
|
[1] | Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031 |
[2] | Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016: 779-788. |
[3] | 刘鑫, 金晅宏.四帧间差分与光流法结合的目标检测及追踪[J].光电工程, 2018, 45(8): 170665. doi: 10.12086/oee.2018.170665 Liu X, Jin X H. Algorithm for object detection and tracking combined on four inter-frame difference and optical flow methods[J]. Opto-Electronic Engineering, 2018, 45(8): 170665. doi: 10.12086/oee.2018.170665 |
[4] | Bewley A, Ge Z Y, Ott L, et al. Simple online and realtime tracking[C]//2016 IEEE International Conference on Image Processing (ICIP), Phoenix, 2016: 3464-3468. |
[5] | Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]//2017 IEEE International Conference on Image Processing (ICIP), Beijing, 2017: 3645-3649. |
[6] | Thoreau M, Kottege N. Improving online multiple object tracking with deep metric learning[Z]. arXiv: 1806.07592v2[cs: CV], 2018. |
[7] | Sadeghian A, Alahi A, Savarese S. Tracking the untrackable: Learning to track multiple cues with long-term dependencies[Z]. arXiv: 1701.01909[cs: CV], 2017. |
[8] | Baisa N L. Online multi-target visual tracking using a HISP filter[C]//13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Funchal, 2018. |
[9] | Bae S H, Yoon K J. Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3): 595-610. doi: 10.1109/TPAMI.2017.2691769 |
[10] | Milan A, Schindler K, Roth S. Multi-target tracking by discrete-continuous energy minimization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2054-2068. doi: 10.1109/TPAMI.2015.2505309 |
[11] | Dehghan A, Assari S M, Shah M. GMMCP tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015: 4091-4099. |
[12] | 齐美彬, 岳周龙, 疏坤, 等.基于广义关联聚类图的分层关联多目标跟踪[J].自动化学报, 2017, 43(1): 152-160. Qi M B, Yue Z L, Shu K, et al. Multi-object tracking using hierarchical data association based on generalized correlation clustering graphs[J]. Acta Automatica Sinica, 2017, 43(1): 152-160. |
[13] | Wen L Y, Li W B, Yan J J, et al. Multiple target tracking based on undirected hierarchical relation hypergraph[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014: 1282-1289. |
[14] | Zagoruyko S, Komodakis N. Learning to compare image patches via convolutional neural networks[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015: 4353-4361. |
[15] | Dai J F, Li Y, He K M, et al. R-FCN: Object detection via region-based fully convolutional networks[Z]. arXiv: 1605.06409[cs: CV], 2016. |
[16] | Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 MB model size[Z]. arXiv: 1602.07360[cs: CV], 2016. |
[17] | He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[Z]. arXiv: 1406.4729[cs: CV], 2014. |
[18] | He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016: 770-778. |
[19] | Zheng L, Shen L Y, Tian L, et al. Scalable person re-identification: A benchmark[C]//2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015: 1116-1124. |
[20] | Bernardin K, Stiefelhagen R. Evaluating multiple object tracking performance: the CLEAR MOT metrics[J]. EURASIP Journal on Image and Video Processing, 2008, 2008: 246309. doi: 10.1155/2008/246309 |
Overview: As the application basis of human behavior recognition, semantic segmentation and unmanned driving, multi-target tracking is one of the research hotspots in the field of computer vision. In complex tracking scenarios, in order to track multiple targets stably and accurately, many difficulties in tracking need to be considered, such as camera motion, interaction between targets, missed detection and error detection. In recent years, with the rapid development of deep learning, many excellent multi-target tracking algorithms based on detection framework have emerged, which are mainly divided into online multi-target tracking method and offline multi-target tracking method. The multi-target tracking framework process on the basis of detection is as following: the target is detected by the off-line trained target detector, and then the similarity matching method is applied to correlate the detection target. Ultimately, the generated trajectory is continuously used to match the detection result to generate more reliable trajectory. Among them, online multi-target tracking methods mainly include Sort, Deep-sort, SDMT, etc., while offline multi-target tracking methods mainly include network flow model, conditional random field model and generalized association graph model. The offline multi-target tracking methods use multi-frame data information to realize the correlation between the target trajectory and the detection result in the data association process, and can obtain better tracking performance, simultaneously. Unfortunately, those methods are not used to real-time application scenarios. The online tracking methods only use the single-frame data information to complete the data association between the trajectory and the new target which is often unreliable, thus the data association of the lost target will be invalid and the ideal tracking effect cannot be obtained. For purpose of solving the reliability problem of the detection results, an online multi-target tracking method based on R-FCN framework is proposed. Firstly, a candidate model combining Kalman filtering prediction results with detection results is devised. The candidate targets are no longer only from the detection results, which enhances the robustness of the algorithm. Secondly, the Siamese network framework is applied to realize the similarity measurement with respect to the target appearance, and the multiple feature information of the target is merged to complete the data association between multiple targets, which improves the discriminating ability of the target in the complex tracking scene. In addition, on account of the possible missed detection and false detection of the target trajectory in the complex scene, the RANSAC algorithm is used to optimize the existing tracking trajectory so that we can obtain more complete and accurate trajectory information and synchronously the trajectories are more continuous and smoother. Finally, compared to some existing excellent algorithms, the experimental result indicates that the proposed method has brilliant performances in tracking accuracy, the number of lost tracks and target missed detections.
The general flow of the algorithm
Candidates selection flow chart
R-FCN network architecture
Siamese network structure diagram
Missing detection of target trajectory
The results of multi-target tracking chart. (a) MOT16-01 sequence tracking result chart; (b) MOT16-03 sequence tracking result chart; (c) MOT16-06 sequence tracking result chart