Zuo H R, Xu Z Y, Zhang J L, Jia G. Visual tracking based on transfer learning of deep salience information. Opto-Electron Adv 3, 190018 (2020). doi: 10.29026/oea.2020.190018
Citation: Zuo H R, Xu Z Y, Zhang J L, Jia G. Visual tracking based on transfer learning of deep salience information. Opto-Electron Adv 3, 190018 (2020). doi: 10.29026/oea.2020.190018

Original Article Open Access

Visual tracking based on transfer learning of deep salience information

More Information
  • In this paper, we propose a new visual tracking method in light of salience information and deep learning. Salience detection is used to exploit features with salient information of the image. Complicated representations of image features can be gained by the function of every layer in convolution neural network (CNN). The characteristic of biology vision in attention-based salience is similar to the neuroscience features of convolution neural network. This motivates us to improve the representation ability of CNN with functions of salience detection. We adopt the fully-convolution networks (FCNs) to perform salience detection. We take parts of the network structure to perform salience extraction, which promotes the classification ability of the model. The network we propose shows great performance in tracking with the salient information. Compared with other excellent algorithms, our algorithm can track the target better in the open tracking datasets. We realize the 0.5592 accuracy on visual object tracking 2015 (VOT15) dataset. For unmanned aerial vehicle 123 (UAV123) dataset, the precision and success rate of our tracker is 0.710 and 0.429.
  • 加载中
  • [1] Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition 4293-4302 (IEEE, 2016); http://doi.org/10.1109/CVPR.2016.465.

    Google Scholar

    [2] Yang M H, Lin R S, Lim J, Ross D. Adaptive discriminative generative model and application to visual tracking: US, 7369682. 2008.

    Google Scholar

    [3] Liu S, Zhang T Z, Cao X C, Xu C S. Structural correlation filter for robust visual tracking. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition 4312-4320 (IEEE, 2016); http://doi.org/10.1109/CVPR.2016.467.

    Google Scholar

    [4] Ko B C, Kwak J Y, Nam J Y. Human tracking in thermal images using adaptive particle filters with online random forest learning. Opt Eng 52, 113105 (2013).

    Google Scholar

    [5] Li X, Dick A, Wang H Z, Shen C H, Van Der Hengel A. Graph mode-based contextual kernels for robust SVM tracking. In Proceedings of 2011 International Conference on Computer Vision 1156-1163 (IEEE, 2011); http://doi.org/10.1109/ICCV.2011.6126364.

    Google Scholar

    [6] Wang N Y, Yeung D Y. Learning a deep compact image representation for visual tracking. In Proceedings of the 26th International Conference on Neural Information Processing Systems 809-817 (Curran Associates Inc, 2013).

    Google Scholar

    [7] Cui Z, Xiao S T, Feng J S, Yan S C. Recurrently target-attending tracking. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition 1449-1458 (IEEE, 2016); http://doi.org/10.1109/CVPR.2016.161.

    Google Scholar

    [8] Wang L J, Ouyang W L, Wang X G, Lu H C. STCT: sequentially training convolutional networks for visual tracking. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition 1373-1381 (IEEE, 2016); http://doi.org/10.1109/CVPR.2016.153.

    Google Scholar

    [9] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems 1097-1105 (Curran Associates Inc, 2012).

    Google Scholar

    [10] Hou X D, Zhang L Q. Saliency detection: a spectral residual approach. In Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition 1-8 (IEEE, 2007); http://doi.org/10.1109/CVPR.2007.383267.

    Google Scholar

    [11] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition 580-587 (IEEE, 2014); http://doi.org/10.1109/CVPR.2014.81.

    Google Scholar

    [12] Yi Y, Su L, Huang Q M, Wu Z, Wang C F. Saliency detection with two-level fully convolutional networks. In Proceedings of 2017 IEEE International Conference on Multimedia and Expo 271-276 (IEEE, 2017); http://doi.org/10.1109/ICME.2017.8019309.

    Google Scholar

    [13] Zhang L H, Ai J W, Jiang B W, Lu H C, Li X K. Saliency Detection via Absorbing Markov Chain with Learnt Transition Probability. IEEE Transactions on image processing: a Publication of the IEEE Signal Processing Society. 27 (2), 987-998 (IEEE, 2018)

    Google Scholar

    [14] Achanta R, Hemami S, Estrada F, Susstrunk S. Frequency-tuned salient region detection. In Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition 1597-1604 (IEEE, 2009); http://doi.org/10.1109/CVPR.2009.5206596.

    Google Scholar

    [15] Cheng M M, Zhang G X, Mitra N J, Huang X L, Hu S M, Global contrast based salient region detection. In Proceedings of CVPR 2011 409-416 (IEEE, 2011); http://doi.org/10.1109/CVPR.2011.5995344.

    Google Scholar

    [16] Zhao R, Ouyang W L, Li H S, Wang X G. Saliency detection by multi-context deep learning. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition 1265-1274 (IEEE, 2015); http://doi.org/10.1109/CVPR.2015.7298731.

    Google Scholar

    [17] Wang L Z, Wang L J, Lu H C, Zhang P P, Ruan X. Saliency detection with recurrent fully convolutional networks. In Proceedings of the 14th European Conference on Computer Vision 825-841 (Springer, 2016); http://doi.org/10.1007/978-3-319-46493-0_50.

    Google Scholar

    [18] Liu N, Han J W. DHSnet: deep hierarchical saliency network for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 678-686 (IEEE, 2016); http://doi.org/10.1109/CVPR.2016.80.

    Google Scholar

    [19] Li X, Zhao L M, Wei L N, Yang M H, Wu F et al. DeepSaliency: multi-task deep neural network model for salient object detection. IEEE Trans Image Process 25, 3919-3930 (2016). doi: 10.1109/TIP.2016.2579306

    CrossRef Google Scholar

    [20] Wang N Y, Li S Y, Gupta A, Yeung D Y. Transferring rich feature hierarchies for robust visual tracking. In Proceedings of 2015 Conference on Computer Vision and Pattern Recognition (2015).

    Google Scholar

    [21] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556 (2014).

    Google Scholar

    [22] Danelljan M, Robinson A, Khan F S, Felsberg M. Beyond correlation filters: learning continuous convolution operators for visual tracking. In Proceedings of the 14th European Conference on Computer Vision 2016 472-488 (Springer, 2016); http://doi.org/10.1007/978-3-319-46454-1_29.

    Google Scholar

    [23] Danelljan M, Bhat G, Khan F S, Felsberg M. ECO: efficient convolution operators for tracking. In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition 6931-6939 (IEEE, 2016); http://doi.org/10.1109/CVPR.2017.733.

    Google Scholar

    [24] Jian M W, Lam K M, Dong J Y, Shen L L. Visual-patch-attention-aware saliency detection. IEEE Trans Cybern 45, 1575-1586 (2015). doi: 10.1109/TCYB.2014.2356200

    CrossRef Google Scholar

    [25] Fang Y M, Lin W S, Lau C T, Lee B S. A visual attention model combining top-down and bottom-up mechanisms for salient object detection. In Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing 1293-1296 (IEEE, 2011); http://doi.org/10.1109/ICASSP.2011.5946648.

    Google Scholar

    [26] Ochs P, Malik J, Brox T. Segmentation of moving objects by long term video analysis. IEEE Trans Pattern Anal Mach Intell 36, 1187-1200 (2014). doi: 10.1109/TPAMI.2013.242

    CrossRef Google Scholar

    [27] Li F X, Kim T, Humayun A, Tsai D, Rehg J M. Video segmentation by tracking many figure-ground segments. In Proceedings of 2013 IEEE International Conference on Computer Vision 2192-2199 (IEEE, 2013); http://doi.org/10.1109/ICCV.2013.273.

    Google Scholar

    [28] Bottou L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010 177-186 (Springer, 2010); http://doi.org/10.1007/978-3-7908-2604-3_16.

    Google Scholar

    [29] Hoffman J, Kulis B, Darrell T, Saenko K. Discovering latent domains for multisource domain adaptation. In Proceedings of the 12th European Conference on Computer Vision 702-715 (Springer, 2012); http://doi.org/10.1007/978-3-642-33709-3_50

    Google Scholar

    [30] Wu Y, Lim J, Yang M H. Online object tracking: a benchmark. In Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition 2411-2418 (IEEE, 2013); http://doi.org/10.1109/CVPR.2013.312.

    Google Scholar

    [31] Kalal Z, Mikolajczyk K, Matas J. Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34, 1409-1422 (2012). doi: 10.1109/TPAMI.2011.239

    CrossRef Google Scholar

    [32] Hare S, Saffari A, Torr P H S. Struck: structured output tracking with kernels. In Proceedings of 2011 International Conference on Computer Vision 263-270 (IEEE, 2011); http://doi.org/10.1109/ICCV.2011.6126251.

    Google Scholar

    [33] Danelljan M, Häger G, Khan F S, Felsberg M. Accurate scale estimation for robust visual tracking. In Proceedings of British Machine Vision Conference (BMVA Press, 2014).

    Google Scholar

    [34] Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr P H S. End-to-end representation learning for correlation filter based tracking. In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition 5000-5008 (IEEE, 2017); http://doi.org/10.1109/CVPR.2017.531.

    Google Scholar

    [35] Wang Q, Gao J, Xing J L, Zhang M D, Hu W M. DCFNet: discriminant correlation filters network for visual tracking. arXiv: 1704.04057 (2017).

    Google Scholar

    [36] Zhang J M, Ma S G, Sclaroff S. MEEM: robust tracking via multiple experts using entropy minimization. In Proceedings of the 13th European Conference on Computer Vision 188-203 (Springer, 2014); http://doi.org/10.1007/978-3-319-10599-4_13.

    Google Scholar

    [37] Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr P H S. Staple: complementary learners for real-time tracking. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition 1404-1409 (IEEE, 2016); http://doi.org/10.1109/CVPR.2016.156.

    Google Scholar

    [38] Ma C, Yang X K, Zhang C Y, Yang M H. Long-term correlation tracking. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition 5388-5396 (IEEE, 2015); http://doi.org/10.1109/CVPR.2015.7299177.

    Google Scholar

    [39] Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37, 583-596 (2015). doi: 10.1109/TPAMI.2014.2345390

    CrossRef Google Scholar

    [40] Fan H, Ling H B. SANet: structure-aware network for visual tracking. In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops 2217-2224 (IEEE, 2017); http://doi.org/10.1109/CVPRW.2017.275.

    Google Scholar

    [41] Yun S, Choi J, Yoo Y, Yun K, Choi J Y. Action-decision networks for visual tracking with deep reinforcement learning. In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition 1349-1358 (IEEE, 2017); http://doi.org/10.1109/CVPR.2017.148.

    Google Scholar

    [42] Danelljan M, Häger G, Khan F S, Felsberg M. Learning spatially regularized correlation filters for visual tracking. In Proceedings of 2015 IEEE International Conference on Computer Vision 4310-4318 (IEEE, 2015); http://doi.org/10.1109/ICCV.2015.490.

    Google Scholar

    [43] Danelljan M, Häger G, Khan F S, Felsberg M. Convolutional features for correlation filter based visual tracking. In Proceedings of 2015 IEEE International Conference on Computer Vision Workshop 621-629 (IEEE, 2015); http://doi.org/10.1109/ICCVW.2015.84.

    Google Scholar

    [44] Bertinetto L, Valmadre J, Henriques J F, Vedaldi A, Torr P H S. Fully-convolutional Siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision 850-865 (Springer, 2016); http://doi.org/10.1007/978-3-319-48881-3_56.

    Google Scholar

    [45] Wu H R, Xu Z Y, Zhang J L, Yan W, Ma X. Face recognition based on convolution Siamese networks. In Proceedings of 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics 1-5 (IEEE, 2017); http://doi.org/10.1109/CISP-BMEI.2017.8302003.

    Google Scholar

    [46] Chen L W, Yin Y M, Li Y, Hong M H. Multifunctional inverse sensing by spatial distribution characterization of scattering photons. Opto-Electron Adv 2, 190019 (2019).

    Google Scholar

    [47] Wu H R, Xu Z Y, Zhang J L, Jia G. Offset-adjustable deformable convolution and region proposal network for visual tracking. IEEE Access 7, 85158-85168 (2019). doi: 10.1109/ACCESS.2019.2925737

    CrossRef Google Scholar

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(10)

Tables(1)

Article Metrics

Article views() PDF downloads() Cited by()

Access History

Other Articles By Authors

Article Contents

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint