Citation: | Yuan Z A, Gu Y, Ma G. Improved CSTrack algorithm for multi-class ship multi-object tracking[J]. Opto-Electron Eng, 2023, 50(12): 230218. doi: 10.12086/oee.2023.230218 |
[1] | Ciaparrone G, Sánchez F L, Tabik S, et al. Deep learning in video multi-object tracking: a survey[J]. Neurocomputing, 2020, 381: 61−88. doi: 10.1016/j.neucom.2019.11.023 |
[2] | 伍瀚, 聂佳浩, 张照娓, 等. 基于深度学习的视觉多目标跟踪研究综述[J]. 计算机科学, 2023, 50(4): 77−87. doi: 10.11896/jsjkx.220300173 Wu H, Lie J H, Zhang Z W, et al. Deep learning-based visual multiple object tracking: a review[J]. Comput Sci, 2023, 50(4): 77−87. doi: 10.11896/jsjkx.220300173 |
[3] | Wang G A, Song M L, Hwang J N. Recent advances in embedding methods for multi-object tracking: a survey[Z]. arXiv: 2205.10766, 2022. https://doi.org/10.48550/arXiv.2205.10766. |
[4] | Xiao T, Li S, Wang B C, et al. Joint detection and identification feature learning for person search[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 3376–3385.https://doi.org/10.1109/CVPR.2017.360. |
[5] | Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]//Proceedings of 2017 IEEE International Conference on Image Processing, Beijing, 2017: 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962. |
[6] | Du Y H, Zhao Z C, Song Y, et al. StrongSORT: make deepSORT great again[Z]. arXiv: 2202.13514, 2023. https://doi.org/10.48550/arXiv.2202.13514. |
[7] | Zhang Y F, Sun P Z, Jiang Y, et al. Bytetrack: multi-object tracking by associating every detection box[C]//Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, 2022: 1–21. https://doi.org/10.1007/978-3-031-20047-2_1. |
[8] | Zhang Y F, Wang C Y, Wang X G, et al. FairMOT: on the fairness of detection and re-identification in multiple object tracking[J]. Int J Comput Vis, 2021, 129(11): 3069−3087. doi: 10.1007/s11263-021-01513-4 |
[9] | Liang C, Zhang Z P, Zhou X, et al. Rethinking the competition between detection and ReID in multiobject tracking[J]. IEEE Trans Image Process, 2022, 31: 3182−3196. doi: 10.1109/TIP.2022.3165376 |
[10] | Prasad D K, Rajan D, Rachmawati L, et al. Video processing from electro-optical sensors for object detection and tracking in a maritime environment: a survey[J]. IEEE Trans Intell Transp Syst, 2017, 18(8): 1993−2016. doi: 10.1109/TITS.2016.2634580 |
[11] | Milan A, Leal-Taixé L, Reid I, et al. MOT16: a benchmark for multi-object tracking[Z]. arXiv: 1603.00831, 2016. https://doi.org/10.48550/arXiv.1603.00831. |
[12] | Wu J L, Cao J L, Song L C, et al. Track to detect and segment: an online multi-object tracker[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 12347–12356. https://doi.org/10.1109/CVPR46437.2021.01217. |
[13] | Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031 |
[14] | Wang Z D, Zheng L, Liu Y X, et al. Towards real-time multi-object tracking[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, 2020: 107–122. https://doi.org/10.1007/978-3-030-58621-8_7. |
[15] | Yu E, Li Z L, Han S D, et al. RelationTrack: relation-aware multiple object tracking with decoupled representation[J]. IEEE Trans Multimedia, 2022, 25: 2686−2697. doi: 10.1109/TMM.2022.3150169 |
[16] | Wan X Y, Zhou S P, Wang J J, et al. Multiple object tracking by trajectory map regression with temporal priors embedding[C]//Proceedings of the 29th ACM International Conference on Multimedia, 2021: 1377–1386. https://doi.org/10.1145/3474085.3475304. |
[17] | Meng F J, Wang X Q, Wang D, et al. Spatial–semantic and temporal attention mechanism-based online multi-object tracking[J]. Sensors, 2020, 20(6): 1653. doi: 10.3390/s20061653 |
[18] | Guo S, Wang J Y, Wang X C, et al. Online multiple object tracking with cross-task synergy[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 8132–8141. https://doi.org/10.1109/CVPR46437.2021.00804. |
[19] | Bloisi D D, Iocchi L, Pennisi A, et al. ARGOS-Venice boat classification[C]//Proceedings of the 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance, Karlsruhe, 2015: 1–6. https://doi.org/10.1109/AVSS.2015.7301727. |
[20] | Shao Z F, Wu W J, Wang Z Y, et al. SeaShips: a large-scale precisely annotated dataset for ship detection[J]. IEEE Trans Multimedia, 2018, 20(10): 2593−2604. doi: 10.1109/TMM.2018.2865686 |
[21] | Ribeiro R, Cruz G, Matos J, et al. A data set for airborne maritime surveillance environments[J]. IEEE Trans Circuits Syst Video Technol, 2017, 29(9): 2720−2732. doi: 10.1109/TCSVT.2017.2775524 |
[22] | 徐安林, 杜丹, 王海红, 等. 结合层次化搜索与视觉残差网络的光学舰船目标检测方法[J]. 光电工程, 2021, 48(4): 200249. doi: 10.12086/oee.2021.200249 Xu A L, Du D, Wang H H, et al. Optical ship target detection method combining hierarchical search and visual residual network[J]. Opto-Electron Eng, 2021, 48(4): 200249. doi: 10.12086/oee.2021.200249 |
[23] | 于国莉, 桑金歌, 李俊荣. 基于改进卷积神经网络的舰船实时目标跟踪识别技术[J]. 舰船科学技术, 2022, 44(21): 152−155. doi: 10.3404/j.issn.1672-7649.2022.21.031 Yu G L, Sang J G, Li J R. Ship real-time target tracking and recognition technology based on improved convolutional neural network[J]. Ship Sci Technol, 2022, 44(21): 152−155. doi: 10.3404/j.issn.1672-7649.2022.21.031 |
[24] | Li G Y, Qiao Y L. A ship target detection and tracking algorithm based on graph matching[J]. J Phys Conf Ser, 2021, 1873: 012056. doi: 10.1088/1742-6596/1873/1/012056 |
[25] | 周越冬. 基于深度学习的遥感图像舰船多目标跟踪方法研究[D]. 西安: 西安电子科技大学, 2021.https://doi.org/10.27389/d.cnki.gxadu.2021.000391. Zhou Y D. Research on ship multiple object tracking in remote sensing image based on deep learning[D]. Xi’an: Xidian University, 2021. https://doi.org/10.27389/d.cnki.gxadu.2021.000391. |
[26] | 陈庆林. 面向舰船视频目标检测的标注与多目标跟踪算法研究[D]. 杭州: 杭州电子科技大学, 2021. https://doi.org/10.27075/d.cnki.ghzdc.2021.000349. Chen Q L. Research on automatic annotation and multi-target tracking algorithm for ship video target detection[D]. Hangzhou: Hangzhou Dianzi University, 2021. https://doi.org/10.27075/d.cnki.ghzdc.2021.000349. |
[27] | 陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 Chen X, Peng D L, Gu Y. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electron Eng, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 |
[28] | Gao S H, Cheng M M, Zhao K, et al. Res2Net: a new multi-scale backbone architecture[J]. IEEE Trans Pattern Anal Mach Intell, 2019, 43(2): 652−662. doi: 10.1109/TPAMI.2019.2938758 |
[29] | Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 13708–13717. https://doi.org/10.1109/CVPR46437.2021.01350. |
[30] | Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 7132–7141. https://doi.org/10.1109/CVPR.2018.00745. |
[31] | Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, 2018: 3–19. https://doi.org/10.1007/978-3-030-01234-2_1. |
[32] | Wang Q L, Wu B G, Zhu P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020: 11531–11539. https://doi.org/10.1109/CVPR42600.2020.01155. |
[33] | Li C Y, Li L L, Jiang H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[Z]. arXiv: 2209.02976, 2022. https://doi.org/10.48550/arXiv.2209.02976. |
[34] | Ge Z, Liu S T, Wang F, et al. YOLOX: exceeding YOLO series in 2021[Z]. arXiv: 2107.08430, 2021. https://doi.org/10.48550/arXiv.2107.08430. |
[35] | Moosbauer S, König D, Jäkel J, et al. A benchmark for deep learning based object detection in maritime environments[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, 2019: 916–925. https://doi.org/10.1109/CVPRW.2019.00121. |
Ship multi-object tracking is an important application scenario in the field of multi-object tracking (MOT), and can be widely applied in both military and civilian fields. The objective of MOT is to locate multiple ship objects and maintain a unique identification (ID) number for each ship object, and record its continuous trajectory. The difficulty of MOT lies in the uncertainty of false positives, false negatives, ID switches, and object numbers. The feature maps obtained by the neck part of the network in CSTrack multi-object tracking algorithm are decomposed into two different feature vectors by decoupling, and are as the input of object detection and Re-identification networks respectively to alleviate the contradiction between these two tasks and improve the performance of multi-object tracking. However, this kind of violent decoupling will bring about the problem of object feature loss, which leads to the deterioration of tracking performance in the case of object occlusion, small objects, or dense objects. To solve this issue, an improved cross-correlation network (CCN) named RES_CCN which can extract fine-grained features is proposed in this paper. This network is composed of an improved Res2net network, coordinate attention, and CCN network, and is inserted between the neck and head modules of the network, so that more fine-grained features can be obtained by increasing the receptive field and inserting more hierarchical residual connection structures into the residual unit before feature decoupling. To meet the requirements of multi-class ship multi-object tracking and improve the detection performance of the algorithm, the decoupled design of the detection head network is used to predict class, confidence, and position of objects, respectively, and binary cross-entropy is used as class loss function and added to the total loss function. Finally, the ablation experimental results on the MOT2016 dataset show that the multiple object tracking accuracy (MOTA) of the proposed algorithm has an improvement of 4.6 compared with that of the original algorithm, and the identification F1 score (IDF1) is increased by 3.4. When tested on the Singapore maritime dataset, the MOTA of the proposed algorithm is improved by 8.4 compared with that of the original CSTrack, and IDF1 is increased by 3.1, which are better than the performance of ByteTrack and other algorithms. The qualitative experimental results show that the proposed algorithm can effectively detect small objects and maintain object IDs in sea-surface scenarios. The algorithm proposed in this paper has the characteristics of high tracking accuracy and low error detection rate, and is suitable for ship multi-object tracking in sea-surface scenarios.
Flowchart of the JDE and CSTrack algorithms. (a) JDE; (b) CSTrack
Network architecture of the CSTrack. (a) Overall framework; (b) CCN and Res_CCN networks; (c) SAAN network; (d) SAM network; (e) CAM network
Overall framework and feature extraction network architecture of the proposed method
Network architecture of the improved Res2net
Network architecture of CA
Network architecture of decoupled head
Flowchart of matching cascade
Comparison of visualization results between our method and baseline on SMD validation set. (a) FN and FP; (b) ID switch and FN
Comparison of visualization results between our method and baseline on MOT validation set. (a) FP and FN; (b) ID switch and special FP