Infrared-visible person re-identification based on multi feature aggregation

Zheng Haijun; Ge Bin; Xia Chenxing; Wu Cheng

doi:10.12086/oee.2023.230136

Article navigation > Opto-Electronic Engineering > 2023 Vol. 50 > No. 7 > 230136

Next Article Previous Article

Zheng H J, Ge B, Xia C X, et al. Infrared-visible person re-identification based on multi feature aggregation[J]. Opto-Electron Eng, 2023, 50(7): 230136. doi: 10.12086/oee.2023.230136

Citation:

Zheng H J, Ge B, Xia C X, et al. Infrared-visible person re-identification based on multi feature aggregation[J]. Opto-Electron Eng, 2023, 50(7): 230136. doi: 10.12086/oee.2023.230136

Infrared-visible person re-identification based on multi feature aggregation

1.
College of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, Anhui 232001, China
2.
Institute of Energy, Hefei Comprehensive National Science Center, Hefei, Anhui 230031, China

Fund Project: Project supported by National Natural Science Foundation of China (6210071479，62102003), National Science and Technology Major Project (2020YFB1314103), Natural Science Foundation of Anhui Province(2108085QF258) and Anhui Postdoctoral Fund (2022B623)

More Information

^*Corresponding author: bge@aust.edu.cn

Received Date 15 June 2023

Revised Date 10 August 2023

Accepted Date 11 August 2023

Published Date 20 August 2023

Abstract

Abstract

Infrared-visible person re-identification has been widely used in video surveillance, intelligent transportation, security, and other fields. However, due to the differences between different image modalities, it brings great challenges to this field. The existing methods mainly focus on mitigating the differences between modes to obtain more discriminating features, but ignore the relationship between adjacent features and the influence of multi-scale information on global features. Here, a infrared-visible person re-identification method (MFANet) based on multi-feature aggregation is proposed to solve the shortcomings of existing methods. Firstly, the adjacent level features are fused in the feature extraction stage, and the integration of low-level feature information is guided to strengthen the high-level features and make the features more robust. Then, the multi-scale features of different receptive fields of view are aggregated to obtain rich contextual information. Finally, multi-scale features are used as a guide to strengthen the features to obtain more discriminating features. Experimental results on SYSU-MM01 and RegDB datasets show the effectiveness of the proposed method, and the average accuracy of SYSU-MM01 dataset reaches 71.77% in the most difficult all-search single-shot mode.
- person re-identification /
- infrared /
- multi-scale /
- adjacent level features

FullText(HTML)

References

[1]	刘丽, 李曦, 雷雪梅. 多尺度多特征融合的行人重识别模型[J]. 计算机辅助设计与图形学学报, 2022, 34(12): 1868−1876. doi: 10.3724/SP.J.1089.2022.19218 CrossRef Google Scholar Liu L, Li X, Lei X M. A person re-identification method with multi-scale and multi-feature fusion[J]. J Comput-Aided Des Comput Graphics, 2022, 34(12): 1868−1876. doi: 10.3724/SP.J.1089.2022.19218 CrossRef Google Scholar
[2]	石跃祥, 周玥. 基于阶梯型特征空间分割与局部注意力机制的行人重识别[J]. 电子与信息学报, 2022, 44(1): 195−202. doi: 10.11999/JEIT201006 CrossRef Google Scholar Shi Y X, Zhou Y. Person re-identification based on stepped feature space segmentation and local attention mechanism[J]. J Electron Inf Technol, 2022, 44(1): 195−202. doi: 10.11999/JEIT201006 CrossRef Google Scholar
[3]	王松, 纪鹏, 张云洲, 等. 自适应感受野网络的行人重识别[J]. 控制与决策, 2022, 37(1): 119−126. doi: 10.13195/j.kzyjc.2020.0505 CrossRef Google Scholar Wang S, Ji P, Zhang Y Z, et al. Adaptive receptive network for person re-identification[J]. Control Decis, 2022, 37(1): 119−126. doi: 10.13195/j.kzyjc.2020.0505 CrossRef Google Scholar
[4]	Wang Z X, Wang Z, Zheng Y Q, et al. Learning to reduce dual-level discrepancy for infrared-visible person re-identification[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 618–626. https://doi.org/10.1109/CVPR.2019.00071. Google Scholar
[5]	Zhong X, Lu T Y, Huang W X, et al. Grayscale enhancement colorization network for visible-infrared person re-identification[J]. IEEE Trans Circ Syst Video Technol, 2022, 32(3): 1418−1430. doi: 10.1109/TCSVT.2021.3072171 CrossRef Google Scholar
[6]	Wu Q, Dai P Y, Chen J, et al. Discover cross-modality nuances for visible-infrared person re-identification[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4330–4339. https://doi.org/10.1109/CVPR46437.2021.00431. Google Scholar
[7]	Zhang D M, Zhang Z Z, Ju Y, et al. Dual mutual learning for cross-modality person re-identification[J]. IEEE Trans Circ Syst Video Technol, 2022, 32(8): 5361−5373. doi: 10.1109/TCSVT.2022.3144775 CrossRef Google Scholar
[8]	Ye M, Shen J B, Crandall D J, et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 229–247. https://doi.org/10.1007/978-3-030-58520-4_14. Google Scholar
[9]	Ye M, Ruan W J, Du B, et al. Channel augmented joint learning for visible-infrared recognition[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 13567–13576. https://doi.org/10.1109/ICCV48922.2021.01331. Google Scholar
[10]	Chen C Q, Ye M, Qi M B, et al. Structure-aware positional transformer for visible-infrared person re-identification[J]. IEEE Trans Image Process, 2022, 31: 2352−2364. doi: 10.1109/TIP.2022.3141868 CrossRef Google Scholar
[11]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000–6010. Google Scholar
[12]	Lu H, Zou X Z, Zhang P P. Learning progressive modality-shared transformers for effective visible-infrared person re-identification[C]//Proceedings of the 37th AAAI Conference on Artificial Intelligence, 2023: 1835–1843. https://doi.org/10.1609/aaai.v37i2.25273. Google Scholar
[13]	Lin B B, Zhang S L, Yu X. Gait recognition via effective global-local feature representation and local temporal aggregation[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 14648–14656. https://doi.org/10.1109/ICCV48922.2021.01438. Google Scholar
[14]	Wu A C, Zheng W S, Yu H X, et al. RGB-infrared cross-modality person re-identification[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 5380–5389. https://doi.org/10.1109/ICCV.2017.575. Google Scholar
[15]	Nguyen D T, Hong H G, Kim K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors, 2017, 17(3): 605. doi: 10.3390/s17030605 CrossRef Google Scholar
[16]	Ye M, Shen J B, Lin G J, et al. Deep learning for person re-identification: a survey and outlook[J]. IEEE Trans Pattern Anal Mach Intell, 2022, 44(6): 2872−2893. doi: 10.1109/TPAMI.2021.3054775 CrossRef Google Scholar
[17]	Ye M, Lan X Y, Wang Z, et al. Bi-directional center-constrained top-ranking for visible thermal person re-identification[J]. IEEE Trans Inf Forensics Secur, 2020, 15: 407−419. doi: 10.1109/TIFS.2019.2921454 CrossRef Google Scholar
[18]	Ye M, Lan X Y, Li J W, et al. Hierarchical discriminative learning for visible thermal person re-identification[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018: 919. https://doi.org/10.1609/aaai.v32i1.12293. Google Scholar
[19]	Wang G A, Zhang T Z, Cheng J, et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 3623–3632. https://doi.org/10.1109/ICCV.2019.00372. Google Scholar
[20]	Li D G, Wei X, Hong X P, et al. Infrared-visible cross-modal person re-identification with an X modality[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 4610–4617. https://doi.org/10.1609/aaai.v34i04.5891. Google Scholar
[21]	Fu C Y, Hu Y B, Wu X, et al. CM-NAS: cross-modality neural architecture search for visible-infrared person re-identification[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 11823–11832. https://doi.org/10.1109/ICCV48922.2021.01161. Google Scholar
[22]	Hao X, Zhao S Y, Ye M, et al. Cross-modality person re-identification via modality confusion and center aggregation[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 16403–16412. https://doi.org/10.1109/ICCV48922.2021.01609. Google Scholar
[23]	Zheng X T, Chen X M, Lu X Q. Visible-infrared person re-identification via partially interactive collaboration[J]. IEEE Trans Image Process, 2022, 31: 6951−6963. doi: 10.1109/TIP.2022.3217697 CrossRef Google Scholar
[24]	Yang M X, Huang Z Y, Hu P, et al. Learning with twin noisy labels for visible-infrared person re-identification[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 14308–14317. https://doi.org/10.1109/CVPR52688.2022.01391. Google Scholar
[25]	Liu H J, Ma S, Xia D X, et al. SFANet: a spectrum-aware feature augmentation network for visible-infrared person reidentification[J]. IEEE Trans Neural Netw Learn Syst, 2023, 34(4): 1958−1971. doi: 10.1109/TNNLS.2021.3105702 CrossRef Google Scholar
[26]	Gong J H, Zhao S Y, Lam K M, et al. Spectrum-irrelevant fine-grained representation for visible–infrared person re-identification[J]. Comput Vis Image Underst, 2023, 232: 103703. doi: 10.1016/j.cviu.2023.103703 CrossRef Google Scholar
[27]	Huang N C, Liu J N, Luo Y J, et al. Exploring modality-shared appearance features and modality-invariant relation features for cross-modality person Re-IDentification[J]. Pattern Recogn, 2023, 135: 109145. doi: 10.1016/j.patcog.2022.109145 CrossRef Google Scholar
[28]	Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. Int J Comput Vis, 2020, 128(2): 336−359. doi: 10.1007/s11263-019-01228-7 CrossRef Google Scholar

Overview

Overview

Infrared-visible person re-identification is a prominent research topic in the field of computer vision, encompassing several essential aspects. These include multi-modal perception technology, challenges in person re-identification, practical application demands, and the development of datasets and evaluation metrics. With the emergence of multi-modal perception technology, the primary objective of infrared-visible light person re-identification is to effectively fuse information from different modalities to enhance the accuracy and robustness of person re-identification. Person re-identification faces challenges such as variations in viewpoint, pose, occlusion, and lighting conditions. Furthermore, infrared-visible person re-identification poses additional challenges as a cross-modal task. This technology holds broad prospects for applications in video surveillance, security, intelligent transportation, and other related fields. Particularly, it is well-suited for person re-identification in low-light or nighttime environments. The development of relevant datasets and evaluation metrics has facilitated ongoing innovation and improvement in infrared-visible person re-identification algorithms and systems. Infrared-visible person re-identification is a research field extensively supported by various backgrounds, providing a foundation for enhancing the performance and application effectiveness of person re-identification. With the continuous exploration of researchers, the accuracy of infrared-visible person re-identification has steadily improved. However, due to the differences between different image modalities, it brings great challenges to this field. The existing methods mainly focus on mitigating the differences between modes to obtain more discriminating features, but ignore the relationship between adjacent features and the influence of multi-scale information on global features. Here, a infrared-visible person re-identification method (MFANet) based on multi-feature aggregation is proposed to solve the shortcomings of existing methods. Firstly, the adjacent level features are fused in the feature extraction stage, and the integration of low-level feature information is guided to strengthen the high-level features and make the features more robust. Then, the multi-scale features of different receptive fields of view are aggregated to obtain rich contextual information. Finally, multi-scale features are used as a guide to strengthen the features to obtain more discriminating features. Experimental results on SYSU-MM01 and RegDB datasets show the effectiveness of the proposed method, and the average accuracy of SYSU-MM01 dataset reaches 71.77% in the all-search single-shot mode and 78.24% in the indoor-search single-shot mode.