Multi-task learning for thermal pedestrian detection

Gou Yutao; Ma Liang; Song Yixuan; Jin Lei; Lei Tao

doi:10.12086/oee.2021.210358

Article navigation > Opto-Electronic Engineering > 2021 Vol. 48 > No. 12 > 210358

Next Article Previous Article

Gou Y T, Ma L, Song Y X, et al. Multi-task learning for thermal pedestrian detection[J]. Opto-Electron Eng, 2021, 48(12): 210358. doi: 10.12086/oee.2021.210358

Citation:

Gou Y T, Ma L, Song Y X, et al. Multi-task learning for thermal pedestrian detection[J]. Opto-Electron Eng, 2021, 48(12): 210358. doi: 10.12086/oee.2021.210358

Multi-task learning for thermal pedestrian detection

1.
Photoelectric Detection Technology Laboratory, Chinese Academy of Sciences, Chengdu, Sichuan 610209, China
2.
Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu, Sichuan 610209, China
3.
University of Chinese Academy of Sciences, Beijing 100049, China

More Information

Corresponding author: Lei Tao, E-mail: taoleiyan@ioe.ac.cn

Received Date 12 November 2021

Revised Date 30 November 2021

Published Date 30 December 2021

Abstract

Abstract

Compared with high-quality RGB images, thermal images tend to have a higher false alarm rate in pedestrian detection tasks. The main reason is that thermal images are limited by imaging resolution and spectral characteristics, lacking clear texture features, while some samples have poor feature quality, which interferes with the network training. We propose a thermal pedestrian algorithm based on a multi-task learning framework, which makes the following improvements based on the multiscale detection framework. First, saliency detection tasks are introduced as an auxiliary branch with the target detection network to form a multitask learning framework, which side-step the detector's attention to illuminate salient regions and their edge information in a co-learning manner. Second, the learning weight of noisy samples is suppressed by introducing the saliency strength into the classification loss function. The detection results on the publicly available KAIST dataset confirm that our learning method can effectively reduce the log-average miss rate by 4.43% compared to the baseline, RetinaNet.
- thermal pedestrian detection /
- multi-task learning /
- saliency detection

FullText(HTML)

References

[1]	Zhang L L, Lin L, Liang X D, et al. Is faster R-CNN doing well for pedestrian detection?[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 443–457. Google Scholar
[2]	Li J N, Liang X D, Shen S M, et al. Scale-aware fast R-CNN for pedestrian detection[J]. IEEE Trans Multimed, 2018, 20(4): 985–996. Google Scholar
[3]	张宝华, 朱思雨, 吕晓琪, 等. 软多标签和深度特征融合的无监督行人重识别[J]. 光电工程, 2020, 47(12): 190636. doi: 10.12086/oee.2020.190636 CrossRef Google Scholar Zhang B H, Zhu S Y, Lv X Q, et al. Soft multilabel learning and deep feature fusion for unsupervised person re-identification[J]. Opto-Electron Eng, 2020, 47(12): 190636. doi: 10.12086/oee.2020.190636 CrossRef Google Scholar
[4]	张晓艳, 张宝华, 吕晓琪, 等. 深度双重注意力的生成与判别联合学习的行人重识别[J]. 光电工程, 2021, 48(5): 200388. doi: 10.12086/oee.2021.200388 CrossRef Google Scholar Zhang X Y, Zhang B H, Lv X Q, et al. The joint discriminative and generative learning for person re-identification of deep dual attention[J]. Opto-Electron Eng, 2021, 48(5): 200388. doi: 10.12086/oee.2021.200388 CrossRef Google Scholar
[5]	Hwang S, Park J, Kim N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 1037–1045. Google Scholar
[6]	Liu J J, Zhang S T, Wang S, et al. Multispectral deep neural networks for pedestrian detection[Z]. arXiv preprint arXiv: 1611.02644, 2016. Google Scholar
[7]	汪荣贵, 王静, 杨娟, 等. 基于红外和可见光模态的随机融合特征金子塔行人重识别[J]. 光电工程, 2020, 47(12): 190669. doi: 10.12086/oee.2020.190669 CrossRef Google Scholar Wang R G, Wang J, Yang J, et al. Feature pyramid random fusion network for visible-infrared modality person re-identification[J]. Opto-Electron Eng, 2020, 47(12): 190669. doi: 10.12086/oee.2020.190669 CrossRef Google Scholar
[8]	张汝榛, 张建林, 祁小平, 等. 复杂场景下的红外目标检测[J]. 光电工程, 2020, 47(10): 200314. doi: 10.12086/oee.2020.200314 CrossRef Google Scholar Zhang R Z, Zhang J L, Qi X P, et al. Infrared target detection and recognition in complex scene[J]. Opto-Electron Eng, 2020, 47(10): 200314. doi: 10.12086/oee.2020.200314 CrossRef Google Scholar
[9]	Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137–1149. Google Scholar
[10]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 779–788. Google Scholar
[11]	John V, Mita S, Liu Z, et al. Pedestrian detection in thermal images using adaptive fuzzy C-means clustering and convolutional neural networks[C]//2015 14th IAPR International Conference on Machine Vision Applications (MVA), 2015: 246–249. Google Scholar
[12]	Devaguptapu C, Akolekar N, Sharma M M, et al. Borrow from anywhere: pseudo multi-modal object detection in thermal imagery[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019: 1029–1038. Google Scholar
[13]	Ghose D, Desai S M, Bhattacharya S, et al. Pedestrian detection in thermal images using saliency maps[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019: 988–997. Google Scholar
[14]	Kieu M, Bagdanov AD, Bertini M, et al. Task-conditioned domain adaptation for pedestrian detection in thermal imagery[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 546–562. Google Scholar
[15]	Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017: 2999–3007. Google Scholar
[16]	Deng Z J, Hu X W, Zhu L, et al. R³Net: recurrent residual refinement network for saliency detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018: 684–690. Google Scholar
[17]	Koch C, Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry[J]. Hum Neurobiol, 1985, 4(4): 219–227. Google Scholar
[18]	Hou X D, Zhang L Q. Saliency detection: a spectral residual approach[C]//2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007: 1–8. Google Scholar
[19]	Montabone S, Soto A. Human detection using a mobile platform and novel features derived from a visual saliency mechanism[J]. Image Vis Comput, 2010, 28(3): 391–402. doi: 10.1016/j.imavis.2009.06.006 CrossRef Google Scholar
[20]	Liu N, Han J W, Yang M H. PiCANet: learning pixel-wise contextual attention for saliency detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 3089–3098. Google Scholar
[21]	Li C Y, Song D, Tong R F, et al. Illumination-aware faster R-CNN for robust multispectral pedestrian detection[J]. Pattern Recognit, 2019, 85: 161–171. doi: 10.1016/j.patcog.2018.08.005 CrossRef Google Scholar
[22]	Li C Y, Song D, Tong R F, et al. Multispectral pedestrian detection via simultaneous detection and segmentation[Z]. arXiv preprint arXiv: 1808.04818, 2018. Google Scholar
[23]	Guo T T, Huynh C P, Solh M. Domain-adaptive pedestrian detection in thermal images[C]//2019 IEEE International Conference on Image Processing (ICIP), 2019: 1660–1664. Google Scholar

Overview

Overview

Overview: In recent years, pedestrian detection techniques based on visible images have been developed rapidly. However, interference from light, smoke, and occlusion makes it difficult to achieve robust detection around the clock by relying on these images alone. Thermal images, on the other hand, can sense the thermal radiation information in the specified wavelength band emitted by the target, which are highly resistant to interference, ambient lighting, etc, and widely used in security and transportation. At present, the detection performance of thermal images still needs to be improved, which suffers from the poor image quality of thermal images and the interference of some noisy samples to network learning.
In order to improve the performance of the thermal pedestrian detection algorithm, we firstly introduce a saliency detection map as supervised information and adopt a framework of multi-task learning, where the main network completes the pedestrian detection task and the auxiliary network satisfies the saliency detection task. By sharing the feature extraction modules of both tasks, the network has saliency detection capability while guiding the network to focus on salient regions. To search for the most reasonable framework of the auxiliary network, we test four different kinds of design from the independent-learning to the guided-attentive model. Secondly, through the visualization of the pedestrian samples, we induce noisy samples that have lower saliency expressions in the thermal images and introduce the saliency strengths of different samples into the classification loss function by hand-designing the mapping function to relieve the interference of noisy samples on the network learning. To achieve this goal, we adopt a sigmoid function with reasonable transformation as our mapping function, which maps the saliency area percentage to the saliency score. Finally, we introduce the saliency score to the Focal Loss and design the Smooth Focal Loss, which can decrease the loss of low-saliency samples with reasonable settings.
Extensive experiments on KAIST thermal images have proved the conclusions as follows. First, compared with other auxiliary frameworks, our cascaded model achieves impressive performance with independent design. Besides, compared with the RetinaNet, we decrease the log-average miss rate by 4.43%, which achieves competitive results among popular thermal pedestrian detection methods. Finally, our method has no impact on the computational cost in the inference process as a network training strategy. Although the effectiveness of our method has been proven, one still needs to set the super-parameters manually. In the future, how to enable the network to adapt to various detection conditions will be our next research point.