An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model

Lu Caiwu; Qi Fan; Ruan Shunling

doi:10.12086/oee.2020.190161

Article navigation > Opto-Electronic Engineering > 2020 Vol. 47 > No. 1 > 190161

Next Article Previous Article

Lu C W, Qi F, Ruan S L. An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model[J]. Opto-Electron Eng, 2020, 47(1): 190161. doi: 10.12086/oee.2020.190161

Citation:

Lu C W, Qi F, Ruan S L. An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model[J]. Opto-Electron Eng, 2020, 47(1): 190161. doi: 10.12086/oee.2020.190161

An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model

School of Management, Xi'an University of Architecture and Technology, Xi'an, Shaanxi 710055, China

Fund Project: Supported by Technological Projects for Prevention and Control of Severe and Extraordinary Accidents in National Safety Production (0020-2018AQ) and Special Project of Shaanxi Education Department (17JK0425)

More Information

Corresponding author: Qi Fan, E-mail: XUATqifan@126.com

Received Date 07 April 2019

Revised Date 23 July 2019

Published Date 01 January 2020

Abstract

Abstract

In order to solve the problem that the current driving warning method cannot adapt to the unstructured road in open-pit mine, this paper proposes an early warning method that integrates target detection and obstacle distance threshold. Firstly, the original Mask R-CNN detection framework was improved according to the characteristics of open-pit mine obstacles, and dilated convolution was introduced into the framework network to expand the receptive field range without reducing the feature map to ensure the detection accuracy of larger targets. Then, a linear distance factor was constructed based on the target detection results to represent the depth information of obstacles in the input image, and an SVM warning model was established. Finally, in order to ensure the generalization ability of the warning model, transfer learning method was adopted to carry out pre-training of the network in COCO data set, and both the C₅ stage and detection layer were trained in the data collected in the field. The experimental results show that the accuracy and recall of the proposed method reach 98.47% and 97.56% in the field data detection, respectively, and the manually designed linear distance factor has a good adaptability to the SVM warning model.
- obstacle warning /
- target detection /
- distance threshold model /
- dilated convolution /
- transfer learning

FullText(HTML)

References

[1]	Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection[C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893. Google Scholar
[2]	Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008: 1-8. Google Scholar
[3]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587. Google Scholar
[4]	Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440-1448. Google Scholar
[5]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031 CrossRef Google Scholar
[6]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788. Google Scholar
[7]	Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517-6525. Google Scholar
[8]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//The 14th European Conference on Computer Vision, 2016: 21-37. Google Scholar
[9]	He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2980-2988. Google Scholar
[10]	Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936-944. Google Scholar
[11]	杨会成, 朱文博, 童英.基于车内外视觉信息的行人碰撞预警方法[J].智能系统学报, 2019, 14(4): 756-760. doi: 10.11992/tis.201801016 CrossRef Google Scholar Yang H C, Zhu W B, Tong Y. Pedestrian collision warning system based on looking-in and looking-out visual information analysis[J]. CAAI Transactions on Intelligent Systems, 2019, 14(4): 756-760. doi: 10.11992/tis.201801016 CrossRef Google Scholar
[12]	Yang D F, Sun F C, Wang S C, et al. Simultaneous estimation of ego-motion and vehicle distance by using a monocular camera[J]. Science China Information Sciences, 2014, 57(5): 1-10. doi: 10.1007/s11432-013-4884-8 CrossRef Google Scholar
[13]	Xu Y F, Wang Y, Guo L. Unsupervised ego-motion and dense depth estimation with monocular video[C]//Proceedings of 2018 IEEE 18th International Conference on Communication Technology, 2018: 1306-1310. Google Scholar
[14]	Tateno K, Tombari F, Laina I, et al. CNN-SLAM: real-time dense monocular SLAM with learned depth prediction[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6565-6574. Google Scholar
[15]	Teichmann M, Weber M, Zöllner M, et al. MultiNet: real-time joint semantic reasoning for autonomous driving[C]//Proceedings of 2018 IEEE Intelligent Vehicles Symposium (Ⅳ), 2018: 1013-1020. Google Scholar
[16]	Li B J, Liu S, Xu W C, et al. Real-time object detection and semantic segmentation for autonomous driving[J]. Proceedings of SPIE, 2017, 10608: 106080P. doi: 10.1117/12.2288713 CrossRef Google Scholar
[17]	Chen L F, Yang Z, Ma J J, et al. Driving scene perception network: real-time joint detection, depth estimation and semantic segmentation[C]//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision, 2018: 1283-1291. Google Scholar
[18]	彭秋辰, 宋亦旭.基于Mask R-CNN的物体识别和定位[J].清华大学学报(自然科学版), 2019, 59(2): 135-141. doi: 10.16511/j.cnki.qhdxxb.2019.22.003 CrossRef Google Scholar Peng Q C, Song Y C. Object recognition and localization based on Mask R-CNN[J]. Journal of Tsinghua University (Science and Technology), 2019, 59(2): 135-141. doi: 10.16511/j.cnki.qhdxxb.2019.22.003 CrossRef Google Scholar
[19]	Kong H, Audibert J Y, Ponce J. Vanishing point detection for road detection[C]//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 96-103. Google Scholar
[20]	Moghadam P, Starzyk J A, Wijesoma W S. Fast vanishing-point detection in unstructured environments[J]. IEEE Transactions on Image Processing, 2012, 21(1): 425-430. doi: 10.1109/TIP.2011.2162422 CrossRef Google Scholar
[21]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778. Google Scholar
[22]	Li Z M, Peng C, Yu G, et al. DetNet: design backbone for object detection[C]//The 15th European Conference on Computer Vision, 2018: 339-354. Google Scholar
[23]	Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[EB/OL]. (2016-04-30). https://arxiv.org/abs/1511.07122v2. Google Scholar

Overview

Overview

Overview: Most of the researches on traffic obstacle warning based on machine vision mainly focus on urban roads. There are no clear road surface, boundary, and road width standard but many steep curves on open-pit mine roads which are quite different from the urban roads. Thus, the target detection and early warning method suitable for urban roads cannot be applied to non-structural open pit roads. With the emergence of convolutional neural network, target detection and depth estimation based on deep learning gradually surpass the traditional computational vision methods in accuracy and applicability. However, target detection and pixel depth estimation are difficult to implement the underlying convolutional layer sharing mechanism. Usually, the driver will make the early-warning judgment based on the distance threshold according to the experience and other factors. Therefore, an early-warning method combining target detection and the distance threshold model is proposed in this paper. First, the original Mask R-CNN detection framework was improved according to the characteristics of open-pit mine obstacles, and the dilated convolution was introduced in ResNet. Under the condition that the subnet structure and residual connection remain unchanged, the 3 by 3 convolution in the C₅ subnet was replaced by the dilated convolution with the dilatation rate of 2, so that the original 3 by 3 receptive field was extended to 7 by 7. The range of the receptive field was expanded to ensure the detection accuracy of the larger target without reducing the feature graph. Then, according to the detection and classification results, the normalized detection frame length, width, area, mask area, and category were used as the depth information to represent the obstacle in the two-dimensional image. Radial basis function SVM warning model was established to judge whether the detected target is a dangerous target. Finally, in order to ensure the generalization ability of the warning model, transfer learning method was adopted to pre-train the network in COCO data, so that sufficient underlying characteristic information was learned in the first four stages. Both C₅ stage and detection layer were trained in the data collected in this paper. The experimental results show that the linear distance factor proposed in this paper can effectively represent the depth information of obstacles, and the Mask R-CNN and yolov3 can adapt to the warning model in this paper. The improved Mask R-CNN in this paper pays more attention to the regression and classification of frames with larger targets, with an accuracy rate of 98.47% and a recall rate of 97.56% which are better than other frames.