软多标签和深度特征融合的无监督行人重识别

张宝华,朱思雨,吕晓琪,等. 软多标签和深度特征融合的无监督行人重识别[J]. 光电工程,2020,47(12):190636. doi: 10.12086/oee.2020.190636
引用本文: 张宝华,朱思雨,吕晓琪,等. 软多标签和深度特征融合的无监督行人重识别[J]. 光电工程,2020,47(12):190636. doi: 10.12086/oee.2020.190636
Zhang B H, Zhu S Y, Lv X Q, et al. Soft multilabel learning and deep feature fusion for unsupervised person re-identification[J]. Opto-Electron Eng, 2020, 47(12): 190636. doi: 10.12086/oee.2020.190636
Citation: Zhang B H, Zhu S Y, Lv X Q, et al. Soft multilabel learning and deep feature fusion for unsupervised person re-identification[J]. Opto-Electron Eng, 2020, 47(12): 190636. doi: 10.12086/oee.2020.190636

软多标签和深度特征融合的无监督行人重识别

  • 基金项目:
    国家自然科学基金资助项目(61962046,61663036,61841204);内蒙古杰青培育项目(2018JQ02);内蒙古草原英才,内蒙古青年科技创新人才项目(第一层次);内蒙古自治区自然科学基金资助项目(2015MS0604,2018MS06018);内蒙古自治区高等学校科学技术研究项目资助(NJZY145)
详细信息
    作者简介:
    通讯作者: 张宝华, E-mail: zbh_wj2004@imust.edu.cn
  • 中图分类号: TP391.4

Soft multilabel learning and deep feature fusion for unsupervised person re-identification

  • Fund Project: Supported by National Natural Science Foundation of China (61962046, 61663036, 61841204), Inner Mongolia Jieqing Cultivation Project (2018JQ02), Inner Mongolia Grassland Talents, Inner Mongolia Youth Science and Technology Innovation Talent Project (Level 1), Inner Mongolia Autonomous Region Natural Science Fund (2015MS0604, 2018MS06018), Inner Mongolia Autonomous Region Higher Education Science Funded by the Technical Research Project (NJZY145)
More Information
  • 跨摄像头场景中依赖面向标签映射关系的学习以提高识别精度,有监督行人重识别模型虽然识别精度较好,但存在可扩展问题,诸如算法识别精度严重依赖有效的监督信息,算法实时性差等;针对上述问题,提出一种基于软多标签的无监督行人重识别算法。为了提高标签匹配精度,首先利用软多标签逼近真实标签,通过计算参考数据集和参考代理在软多标签函数中的损失函数,预训练参考数据集,并构建预训练与训练结果的映射模型。再通过生成数据和真实数据分布的最小距离的期望即简化的2-Wasserstein距离计算相机视图中软多标签均值和标准差得到损失函数,解决跨视域标签一致性问题。为了提高软多标签对未标记目标数据集的有效性,计算联合嵌入损失,挖掘不同类别间的相似对,纠正跨域分布错位。针对残差网络训练时长和无监督学习精度低的问题,通过结合压缩激励网络(SENet)和多层级深度特征融合改进残差网络的结构,提高训练速度和精度。实验结果表明,该方法在标准数据集下的首位命中率和平均精度均值优于先进相关算法。

  • Overview: People re-identification is mainly used to retrieve pedestrians of interest in the images taken by the camera, and then retrieve targets similar to the people's image. This technology can save a lot of time and manpower in finding the images of the suspect in the pedestrian database, and has good application prospects in intelligent security, criminal investigation, and image retrieval. The supervised person re-identification model has better recognition accuracy, but there are scalability problems. For example, the accuracy of algorithm identification relies heavily on effective supervised information. When adding a small amount of data in the classification process, all data needs to be reprocessed, resulting in poor real-time performance. Aiming at the above problems, an unsupervised person re-identification algorithm based on soft multilabel is proposed. By learning the feature of the target, and then comparing it with the labeled reference datasets, each unlabeled target gets a soft multilabel. In this learning process, in order to obtain more accurate soft multilabel, we introduce the concept of reference agents and in order to reduce the difference between reference agents and labeled reference datasets, we pre-trained the reference datasets. Using a reference agent instead of a labeled reference dataset to compare with an unlabeled target. We also use three loss functions, which are used to mine hard negative pair information, make the cross-camera labels of the same target consistent, and correct cross-domain distribution misalignment. In these three loss functions, the purpose of mining hard negative pair information is to determine negative pairs more accurately and push the distance of negative pairs farther away; The cross-camera label consistency is to reduce the gap between multilabel for the same target under different camera distributions. Using the simplified 2-Wasserstein distance, the mean and standard deviation vectors of soft multilabel in different camera views are calculated; In order to further improve the effectiveness of the reference agent and solve the problem of cross-domain distribution misalignment, for each reference agent, find unlabeled people close to it and design a loss function. In the process of feature extraction, we use multi-level deep feature fusion to complement deep features with shallow features to achieve the purpose of improving feature robustness and thereby improving the recognition accuracy. We also tried to integrate squeeze-and-excitation networks (SENet) into the residual network to achieve a function similar to the attention mechanism to improve the learning speed. Experimental results show that rank-1 and mAP in this paper are superior to advanced correlation algorithms.

  • 加载中
  • 图 1  软多标签学习损失函数

    Figure 1.  Soft multilabel learning loss function illustrate

    图 2  ResNet-50结构图

    Figure 2.  ResNet-50 illustrate

    图 3  SE_ResNet网络结构图

    Figure 3.  SE_ResNet illustrate

    图 4  实验模型结构图

    Figure 4.  Experimental model illustrate

    图 6  对 SE_ResNet-50 进行调参实验结果。 (a) 调纪元数 epoch 参数实验结果曲线图; (b) 调学习率 learning rate 实验结果曲线图; (c) 调权重衰减 weight decay 实验结果曲线图

    Figure 6.  Results of adjusting hyperparameters for SE_ResNet-50. (a) Adjusting the experimental results of the epoch parameters; (b) Adjusting the learning rate experimental results; (c) Adjusting the weight decay experimental results

    表 1  预训练对训练结果的影响

    Table 1.  The effect of pre-trained on training results

    Train Pre-train Market
    R1 R5 R10 mAP
    ResNet50 ResNet50(loss_total=0.622) 66.627 81.977 86.609 39.361
    ResNet50 ResNet50, imageNet=None 61.401 77.316 82.245 33.930
    ResNet50 None 42.548 60.778 69.032 22.139
    SE_ResNet50 SE_ResNet50 51.989 69.893 77.078 28.330
    SE_ResNet50 ResNet50(loss_total=0.622) 64.371 81.621 86.461 38.236
    SE_ResNet50 None 35.362 51.425 58.462 17.403
    下载: 导出CSV

    表 2  特征融合实验结果

    Table 2.  Feature fusion experimental results

    Methods Market
    R1 R5 R10 mAP
    ResNet50+layer1+layer3 52.049 69.567 76.395 28.279
    ResNet50+layer1+layer4 66.330 81.473 86.520 39.581
    ResNet50+layer2+layer3 51.306 69.240 76.306 28.113
    ResNet50+layer2+layer4 62.500 77.316 83.254 35.884
    ResNet50+layer3+layer4 67.102 81.977 86.876 40.036
    ResNet50+layer1+layer3+layer4 68.973 82.601 86.995 41.188
    下载: 导出CSV

    表 3  消融实验

    Table 3.  Ablation study

    Methods Market-1501 DukeMTMC-reID
    R1 R5 R10 mAP R1 R5 R10 mAP
    w/o LCML 60.0 75.9 81.9 34.6 63.2 77.2 82.5 44.9
    w/o LRAL 59.2 76.4 82.3 30.8 57.9 72.6 77.8 37.1
    w/o LCML & LRAL 53.9 71.5 77.7 28.2 60.1 73.0 78.4 40.4
    ResNet50+LMAR 66.627 81.977 86.609 39.361 67.1 79.8 84.2 48.0
    下载: 导出CSV

    表 4  与相关方法无监督行人重识别精度对比

    Table 4.  Comparison of unsupervised person recognition accuracy with related methods

    Methods Reference Market Duke
    R1 R5 mAP R1 R5 mAP
    CAMEL[5] ICCV’17 54.5 73.1 26.3 40.3 57.6 19.8
    PUL[6] ToMM’18 45.5 60.7 20.5 30.0 43.4 16.4
    TJ-AIDL[7] CVPR’18 58.2 74.8 26.5 44.3 59.6 23.0
    PTGAN[8] CVPR’18 38.6 57.3 15.7 27.4 43.6 13.5
    SPGAN[9] CVPR’18 51.5 70.1 27.1 41.1 56.6 22.3
    HHL[10] ECCV’18 62.2 78.8 31.4 46.9 61.0 27.2
    MMFA[22] BMVC’18 45.3 - 24.7 56.7 - 27.4
    DECAMEL[23] SCI’18 60.24 - 32.44 - - -
    ARN[24] CVPRW’19 70.3 80.4 39.4 60.2 73.9 33.4
    BUC[25] AAAI’19 66.2 79.6 38.3 47.4 62.6 27.5
    MAR[11] CVPR’19 67.7 81.9 40.0 67.1 79.8 48.0
    The proposed method This work 68.97 82.6 41.2 68.6 80.6 50.1
    下载: 导出CSV
  • [1]

    Xiong F, Xiao Y, Cao Z G, et al. Good practices on building effective CNN baseline model for person re-identification[J]. Proceedings of SPIE, 2019, 11069: 110690I.

    [2]

    Wang S Q, Xu X, Liu L, et al. Multi-level feature fusion model-based real-time person re-identification for forensics[J]. Journal of Real-Time Image Processing, 2020, 17(1): 73-81. doi: 10.1007/s11554-019-00908-4

    [3]

    Bak S, Carr P, Lalonde J F. Domain adaptation through synthesis for unsupervised person re-identification[J]. ECCV, 2018: 189-205. http://link.springer.com/chapter/10.1007/978-3-030-01261-8_12

    [4]

    Ye M, Li J W, Ma A J, et al. Dynamic graph co-matching for unsupervised video-based person re-identification[J]. IEEE Transactions on Image Processing, 2019, 28(6): 2976-2990. doi: 10.1109/TIP.2019.2893066

    [5]

    Yu H X, Wu A C, Zheng W S. Cross-view asymmetric metric learning for unsupervised person re-identification[C]// Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 994-1002.

    [6]

    Fan H H, Zheng L, Yan C G, et al. Unsupervised person re-identification: clustering and fine-tuning[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2018, 14(4): 83. http://arxiv.org/abs/1705.10444

    [7]

    Wang J Y, Zhu X T, Gong S G, et al. Transferable joint attribute-identity deep learning for unsupervised person re-identification[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 2275-2284.

    [8]

    Wei L G, Zhang S l, Gao W, et al. Person transfer GAN to bridge domain gap for person re-identification[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 79-88.

    [9]

    Deng W J, Zheng L, Ye Q X, et al. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 994-1003.

    [10]

    Zhong Z, Zheng L, Li S Z, et al. Generalizing a person retrieval model hetero-and homogeneously[C]//Proceedings of the European Conference on Computer Vision, Glasgow, 2018: 172-188.

    [11]

    Yu H X, Zheng W S, Wu A C, et al. Unsupervised person re-identification by soft multilabel learning[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 2019: 2148-2157.

    [12]

    He R, Wu X, Sun Z N, et al. Wasserstein CNN: learning invariant features for NIR-VIS face recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(7): 1761-1773. http://ieeexplore.ieee.org/document/8370677/

    [13]

    Wang F, Xiang X, Cheng J, et al. NormFace: L2 hypersphere embedding for face verification[C]//Proceedings of the 25th ACM International Conference on Multimedia, California, Mountain View, 2017: 1041-1049.

    [14]

    Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 7132-7141.

    [15]

    Wang C, Zhang Q, Huang C, et al. Mancs: a multi-task attentional network with curriculum sampling for person re-identification[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, 2018: 365-381.

    [16]

    Fan H, Zheng L, Yan C, et al. Unsupervised Person Re-identification by Deep Learning Tracklet Association[J]. Acm Transactions on Multimedia Computing Communications & Applications, 2018, 14(4): 1-18. http://arxiv.org/abs/1809.02874

    [17]

    He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 770-778.

    [18]

    Wang Y, Wang L Q, You Y R, et al. Resource aware person re-identification across multiple resolutions[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 8042-8051.

    [19]

    Hu Y, Wen G H, Luo M N, et al. Competitive inner-imaging squeeze and excitation for residual network[Z]. arXiv: 1807.08920[cs: CV], 2018.

    [20]

    Zheng L, Shen L Y, Tian L, et al. Scalable person re-identification: a benchmark[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 1116-1124.

    [21]

    Zheng Z D, Zheng L, Yang Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 3754-3762.

    [22]

    Lin S, Li H L, Li C T, et al. Multi-task mid-level feature alignment network for unsupervised cross-dataset person re-identification[Z]. arXiv: 1807.01440[cs: CV], 2018.

    [23]

    Yu H X, Wu A C, Zheng W S. Unsupervised person re-identification by deep asymmetric metric embedding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 42(4): 956-973.

    [24]

    Li Y J, Yang F E, Liu Y C, et al. Adaptation and re-identification network: an unsupervised deep transfer learning approach to person re-identification[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, 2018: 172-178.

    [25]

    Lin Y T, Dong X Y, Zheng L, et al. A bottom-up clustering approach to unsupervised person re-identification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019: 8738-8745.

  • 加载中

(5)

(4)

计量
  • 文章访问数:  3934
  • PDF下载数:  603
  • 施引文献:  0
出版历程
收稿日期:  2019-10-24
修回日期:  2020-03-02
刊出日期:  2020-12-15

目录

/

返回文章
返回