Improved algorithm of Faster R-CNN based on double threshold-non-maximum suppression

Hou Zhiqiang; Liu Xiaoyi; Yu Wangsheng; Ma Sugang

doi:10.12086/oee.2019.190159

Article navigation > Opto-Electronic Engineering > 2019 Vol. 46 > No. 12 > 190159

Next Article Previous Article

Hou Zhiqiang, Liu Xiaoyi, Yu Wangsheng, et al. Improved algorithm of Faster R-CNN based on double threshold-non-maximum suppression[J]. Opto-Electronic Engineering, 2019, 46(12): 190159. doi: 10.12086/oee.2019.190159

Citation:

Hou Zhiqiang, Liu Xiaoyi, Yu Wangsheng, et al. Improved algorithm of Faster R-CNN based on double threshold-non-maximum suppression[J]. Opto-Electronic Engineering, 2019, 46(12): 190159. doi: 10.12086/oee.2019.190159

Improved algorithm of Faster R-CNN based on double threshold-non-maximum suppression

1.
College of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121, China
2.
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121, China
3.
Information and Navigation Institute of Air Force Engineering University, Xi'an, Shaanxi 710077, China

Fund Project: Supported by National Natural Science Foundation of China (61703423, 61473309) and Xi'an University of Posts and Telecommunications Graduate Innovation Fund (CXJJ2017019)

More Information

^*Corresponding author: Liu Xiaoyi, E-mail:18829290763@163.com

Received Date 08 April 2019

Revised Date 30 May 2019

Published Date 01 December 2019

Abstract

Abstract

According to the problems of target missed detection and repeated detection in the object detection algorithm, this paper proposes an improved Faster R-CNN algorithm based on dual threshold-non-maximum suppression. The algorithm first uses the deep convolutional network architecture to extract the multi-layer convolution features of the targets, and then proposes the dual threshold-non-maximum suppression (DT-NMS) algorithm in the RPN(region proposal network). The phase extracts the deep information of the target candidate regions, and finally uses the bilinear interpolation method to improve the nearest neighbor interpolation method in the original RoI pooling layer, so that the algorithm can more accurately locate the target on the detection dataset. The experimental results show that the DT-NMS algorithm effectively balances the relationship between the single-threshold algorithm and the target missed detection problem, and reduces the probability of repeated detection. Compared with the soft-NMS algorithm, the repeated detection rate of the DT-NMS algorithm in PASCAL VOC2007 is reduced by 2.4%, and the target error rate of multiple detection is reduced by 2%. Compared with the Faster R-CNN algorithm, the detection accuracy of this algorithm on the PASCAL VOC2007 is 74.7%, the performance is improved by 1.5%, and the performance on the MSCOCO dataset is improved by 1.4%. At the same time, the algorithm has a fast detection speed, reaching 16 FPS.
- computer vision /
- object detection /
- non-maximum suppression /
- convolutional neural network

FullText(HTML)

References

[1]	Borji A, Cheng M M, Jiang H Z, et al. Salient object detection: a benchmark[J]. IEEE Transactions on Image Processing, 2015, 24(12): 5706-5722. doi: 10.1109/TIP.2015.2487833 CrossRef Google Scholar
[2]	罗海波, 许凌云, 惠斌, 等.基于深度学习的目标跟踪方法研究现状与展望[J].红外与激光工程, 2017, 46(5): 0502002. Google Scholar Luo H B, Xu L Y, Hui B, et al. Status and prospect of target tracking based on deep learning[J]. Infrared and Laser Engineering, 2017, 46(5): 0502002. Google Scholar
[3]	侯志强, 韩崇昭.视觉跟踪技术综述[J].自动化学报, 2006, 32(4): 603-617. Google Scholar Hou Z Q, Han C Z. A survey of visual tracking[J]. Acta Automatica Sinica, 2006, 32(4): 603-617. Google Scholar
[4]	辛鹏, 许悦雷, 唐红, 等.全卷积网络多层特征融合的飞机快速检测[J].光学学报, 2018, 38(3): 0315003. Google Scholar Xin P, Xu Y L, Tang H, et al. Fast airplane detection based on multi-layer feature fusion of fully convolutional networks[J]. Acta Optica Sinica, 2018, 38(3): 0315003. Google Scholar
[5]	戴伟聪, 金龙旭, 李国宁, 等.遥感图像中飞机的改进YOLOv3实时检测算法[J].光电工程, 2018, 45(12): 180350. doi: 10.12086/oee.2018.180350 CrossRef Google Scholar Dai W C, Jin L X, Li G N, et al. Real-time airplane detection algorithm in remote-sensing images based on improved YOLOv3[J]. Opto-Electronic Engineering, 2018, 45(12): 180350. doi: 10.12086/oee.2018.180350 CrossRef Google Scholar
[6]	王思明, 韩乐乐.复杂动态背景下的运动目标检测[J].光电工程, 2018, 45(10): 180008 doi: 10.12086/oee.2018.180008 CrossRef Google Scholar Wang S M, Han L L. Moving object detection under complex dynamic background[J]. Opto-Electronic Engineering, 2018, 45(10): 180008. doi: 10.12086/oee.2018.180008 CrossRef Google Scholar
[7]	周炫余, 刘娟, 卢笑, 等.一种联合文本和图像信息的行人检测方法[J].电子学报, 2017, 45(1): 140-146. doi: 10.3969/j.issn.0372-2112.2017.01.020 CrossRef Google Scholar Zhou X Y, Liu J, Lu X, et al. A method for pedestrian detection by combining textual and visual information[J]. Acta Electronica Sinica, 2017, 45(1): 140-146. doi: 10.3969/j.issn.0372-2112.2017.01.020 CrossRef Google Scholar
[8]	曹明伟, 余烨.基于多层背景模型的运动目标检测[J].电子学报, 2016, 44(9): 2126-2133. doi: 10.3969/j.issn.0372-2112.2016.09.016 CrossRef Google Scholar Cao M W, Yu Y. Moving object detection based on multi-layer background model[J]. Acta Electronica Sinica, 2016, 44(9): 2126-2133. doi: 10.3969/j.issn.0372-2112.2016.09.016 CrossRef Google Scholar
[9]	Zhang Z S, Qiao S Y, Xie C H, et al. Single-shot object detection with enriched semantics[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 5813-5821. Google Scholar
[10]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 779-788. Google Scholar
[11]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[EB/OL]. (2016-12-29) [2019-05-28]. arXiv: 1512. 02325 v1. https://arxiv.org/abs/1512.02325v1. Google Scholar
[12]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 580-587. Google Scholar
[13]	He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. doi: 10.1109/TPAMI.2015.2389824 CrossRef Google Scholar
[14]	Girshick R. Fast R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440-1448. Google Scholar
[15]	Ren S, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[EB/OL]. (2015-06-04)[2019-05-28]. arXiv: 1506.01497. https: //arxiv.org/abs/1506.01497?source=post_page. Google Scholar
[16]	Bodla N, Singh B, Chellappa R, et al. Soft-NMS - improving object detection with one line of code[C]//Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5562-5570. Google Scholar
[17]	He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]// Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980-2988. Google Scholar
[18]	Wang X L, Shrivastava A, Gupta A. A-Fast-RCNN: hard positive generation via adversary for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3039-3048. Google Scholar
[19]	Kong T, Sun F C, Yao A B, et al. RON: reverse connection with objectness prior networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5244-5252. Google Scholar
[20]	Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6517-6525. Google Scholar
[21]	Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, doi: 10.1109/TPAMI.2018.2858826. CrossRef Google Scholar
[22]	Redmon J, Farhadi A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08)[2019-05-28]. arXiv: 1804.02767. https://arxiv.org/abs/1804.02767. Google Scholar
[23]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770-778. Google Scholar
[24]	Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2261-2269. Google Scholar

Overview

Overview

Overview: The Faster R-CNN algorithm uses the non-maximum suppression algorithm for proposals filtering. It adopts the idea of “non-one or zero”, leaving only the candidate box with the highest score of the classification targets, which greatly increases the risk that the target will be missed when it is highly overlapping. Therefore, the “weight penalty” strategy is employed by the soft-NMS algorithm to solve this problem, which reduces the target missed detection to a certain extent. However, the test found that the use of the soft-NMS algorithm will greatly increase the number of proposals, resulting in a new problem that the same target is repeatedly detected and multiple detections have mis-targeted the targets, especially when there are multiple targets in the image and the degree of overlap of the targets is high. According to the problems of target missed detection and repeated detection in the object detection algorithm, this paper proposes an improved Faster R-CNN algorithm based on double threshold-non-maximum suppression. The algorithm first uses the VGG-Net-16 deep convolutional network architecture to extract the multi-layer convolution features of the targets, and then proposes the dual threshold-non-maximum suppression (DT-NMS) algorithm in the RPN (region proposal network). The stage extracts the deep information of the target candidate regions, and finally uses the bilinear interpolation method to improve the nearest neighbor interpolation method in the original RoI pooling layer, so that the algorithm can locate the targets more accurately on the detection dataset. In order to highlight the performance of the DT-NMS algorithm on the target repetitive detection problem, this paper first proposed the repeated detection rate and the object mis-distribution rate of multiple detections as the measurement index. By simply setting the threshold in the DT-NMS algorithm, the relationship between the single-threshold algorithm and the target misdetection problem is effectively balanced, and the probability that the same target is detected multiple times is reduced. The improved Faster R-CNN algorithm re-adjusts network training and parameters on the VGG-Net-16 network structure, and a lot of experimental verification on the PASCAL VOC data set has been implemented. The experimental results show that compared with the soft-NMS algorithm, the repeated detection rate of the proposed algorithm in PASCAL VOC2007 is reduced by 2.4%, and the target error rate of multiple detections is reduced by 2%, indicating that the improved algorithm solves the problem of target missed detection and repeated detection in the traditional algorithms. Compared with the Faster R-CNN algorithm, the detection accuracy of this algorithm on the PASCAL VOC2007 is 74.7%, and the performance is improved by 1.5%. At the same time, the algorithm has a fast detection speed, reaching 16 FPS.