光电工程  2019, Vol. 46 Issue (9): 180606      DOI: 10.12086/oee.2019.180606

Multi-occluded pedestrian real-time detection algorithm based on preprocessing R-FCN
Liu Hui, Peng Li, Wen Jiwei
Engineering Research Center of Internet of Things Technology Applications of the Ministry of Education, School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
Abstract: One of main challenges of driver assistance systems is to detect multi-occluded pedestrians in real-time in complicated scenes, to reduce the number of traffic accidents. In order to improve the accuracy and speed of detection system, we proposed a real-time multi-occluded pedestrian detection algorithm based on R-FCN. RoI Align layer was introduced to solve misalignments between the feature map and RoI of original images. A separable convolution was optimized to reduce the dimensions of position-sensitive score maps, to improve the detection speed. For occluded pedestrians, a multi-scale context algorithm is proposed, which adopt a local competition mechanism for adaptive context scale selection. For low visibility of the body occlusion, deformable RoI pooling layers were introduced to expand the pooled area of the body model. Finally, in order to reduce redundant information in the video sequence, Seq-NMS algorithm is used to replace traditional NMS algorithm. The experiments have shown that there is low detection error on the datasets Caltech and ETH, the accuracy of our algorithm is better than that of the detection algorithms in the sets, works particularly well with occluded pedestrians.
Keywords: multi-occluded pedestrian    separable convolution layer    multi-scale context    deformable RoI pooling layer

1 引言

 图 1 整体网络结构图 Fig. 1 Schematic of the network structure
2 R-FCN网络

 图 2 R-FCN结构图 Fig. 2 Schematic of the R-FCN structure
 ${r_c}(i, j|\mathit{\Theta} ) = \frac{1}{n}\sum\limits_{(x, y) \in bin(i, j)} {{z_{i, j, c}}} (x + {x_0}, y + {y_0}|\mathit{\Theta} ),$ (1)

3.3 可形变池化层

 图 4 3×3可形变RoI池化示例 Fig. 4 Illustration of 3×3 deformable RoI pooling
 ${{\mathit{\boldsymbol{y}}}}(i, j){\rm{ }} = \sum\nolimits_{p \in bin(i, j)} {{{\mathit{\boldsymbol{x}}}}({p_0} + p)/{n_{i, j}}} ,$ (4)

 ${{\mathit{\boldsymbol{y}}}}(i, j){\rm{ }} = \sum\nolimits_{p \in bin(i, j)} {{{\mathit{\boldsymbol{x}}}}({p_0} + p + \Delta {p_{i, j}})/{n_{i, j}}} 。$ (5)

 ${{\mathit{\boldsymbol{x}}}}(p) = \sum\nolimits_q {G(q, p) \cdot {{\mathit{\boldsymbol{x}}}}(q)},$ (6)

3.5 算法训练检测步骤

4 实验结果与分析

 $I{\rm{o}}U = \frac{{area({B_{{\rm{dt}}}} \cap {B_{{\rm{gt}}}})}}{{area({B_{{\rm{dt}}}} \cup {B_{{\rm{gt}}}})}} > 0.5,$ (10)

 $FPPI = \frac{{FP}}{{TN + FP}} \times 100\% {\rm{ }},$ (11)
 $MR = \frac{{FN}}{{FN + TP}} \times 100\% {\rm{ }},$ (12)

4.1 Caltech实验结果比较

 图 5 Caltech数据集的结果比较。(a)部分遮挡；(b)严重遮挡 Fig. 5 Comparison results on the Caltech bench-mark. (a) Part-occlusion; (b) Heavy-occlusion

 Algorithm Fast D-FCN SSD R-FCN Test size Base-model Part-occlusion(MR)/% 640x480 ResNet-50 14.86 512x512 ResNet-50 20.49 640x480 ResNet-50 16.09 Heavy-occlusion(MR)/% 42.36 57.64 55.81 Speed/(f/s) 48.71 35.42 11.24

4.2 ETH数据实验比较

 图 6 ETH数据集检测结果 Fig. 6 Results on the ETH benchmark

Caltech数据检测效果如图 7(a)，7(b)所示，7(a)表示部分遮挡，7(b)表示严重遮挡。ETH数据检测效果如图 7(c)，7(d)所示，7(c)表示部分遮挡，7(d)表示严重遮挡。

 图 7 算法检测效果 Fig. 7 Test result carried out by the algorithm
5 结论

 [1] Dollar P, Wojek C, Schiele B, et al. Pedestrian detection: an Evaluation of the State of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4): 743-761. [Crossref] [2] Wang X Y, Han T X, Yan S C. An HOG-LBP human detector with partial occlusion handling[C]//Proceedings of the 12th IEEE International Conference on Computer Vision, 2009: 32–39. [Crossref] [3] Dai J F, Li Y, He K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proceedings of the 30th Conference on Neural Information Processing Systems, 2016: 379–387. [Crossref] [4] Wang K J, Zhao Y D, Xing X L. Deep learning in driverless vehicles[J]. CAAI Transactions on Intelligent Systems, 2018, 13(1): 55-69. 王科俊, 赵彦东, 邢向磊. 深度学习在无人驾驶汽车领域应用的研究进展[J]. 智能系统学报, 2018, 13(1): 55-69 [Crossref] [5] Wang Z L, Huang M, Zhu Q B, et al. The optical flow detection method of moving target using deep convolution neural network[J]. Opto-Electronic Engineering, 2018, 45(8): 180027. 王正来, 黄敏, 朱启兵, 等. 基于深度卷积神经网络的运动目标光流检测方法[J]. 光电工程, 2018, 45(8): 180027 [Crossref] [6] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21–37. [7] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015: 91–99. [Crossref] [8] Cheng D Q, Tang S X, Feng C C, et al. Extended HOG-CLBC for pedstrain detection[J]. Opto-Electronic Engineering, 2018, 45(8): 180111. 程德强, 唐世轩, 冯晨晨, 等. 改进的HOG-CLBC的行人检测方法[J]. 光电工程, 2018, 45(8): 180111 [Crossref] [9] Ouyang W L, Wang X G. Joint deep learning for pedestrian detection[C]//Proceedings of 2013 IEEE International Conference on Computer Vision, 2014: 2056–2063. [Crossref] [10] Tian Y L, Luo P, Wang X G, et al. Deep learning strong parts for pedestrian detection?[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1904–1912. [Crossref] [11] Ouyang W L, Zeng X Y, Wang X G. Partial occlusion handling in pedestrian detection with a deep model[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(11): 2123-2137. [Crossref] [12] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[J]. arXiv: 1512.00567v3[cs.CV], 2015. [13] Han W, Khorrami P, Le Paine P, et al. Seq-NMS for video object detection[J]. arXiv: 1602.08465[cs.CV], 2016. [14] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778. [Crossref] [15] He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]// Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2980–2988. [16] Dai J F, Qi H Z, Xiong Y W, et al. Deformable convolutional networks[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 764–773. [Crossref] [17] Bell S, Zitnick C L, Bala K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2874–2883. [18] Cai Z W, Fan Q F, Feris R S, et al. A unified multi-scale deep convolutional neural network for fast object detection[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 354–370. [19] Goodfellow I J, Warde-Farley D, Mirza M, et al. Maxout networks[J]. JMLR WCP, 2013, 28(3): 1319-1327. [Crossref] [20] Zhang L L, Lin L, Liang X D, et al. Is faster R-CNN doing well for pedestrian detection?[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 443–457. [Crossref] [21] Tian Y L, Luo P, Wang X G, et al. Pedestrian detection aided by deep learning semantic tasks[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 5079–5087. [Crossref] [22] Du X Z, El-Khamy M, Lee J, et al. Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection[C]//Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision, 2017. [Crossref] [23] Dollár P, Appel R, Belongie S, et al. Fast feature pyramids for object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(8): 1532-1545. [Crossref] [24] Nam W, Dollár P, Han J H. Local decorrelation for improved pedestrian detection[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 424–432. [Crossref]