Point cloud-image data fusion for road segmentation

Zhang Ying; Huang Yingping; Guo Zhiyang; Zhang Chong

doi:10.12086/oee.2021.210340

Article navigation > Opto-Electronic Engineering > 2021 Vol. 48 > No. 12 > 210340

Next Article Previous Article

Zhang Y, Huang Y P, Guo Z Y, et al. Point cloud-image data fusion for road segmentation[J]. Opto-Electron Eng, 2021, 48(12): 210340. doi: 10.12086/oee.2021.210340

Citation:

Zhang Y, Huang Y P, Guo Z Y, et al. Point cloud-image data fusion for road segmentation[J]. Opto-Electron Eng, 2021, 48(12): 210340. doi: 10.12086/oee.2021.210340

Point cloud-image data fusion for road segmentation

School of Optical-Electronic and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

Fund Project: the Shanghai Natural Science Foundation of Shanghai Science and Technology Commission, China (20ZR14379007), and National Natural Science Foundation of China (61374197)

More Information

Corresponding author: Huang Yingping, E-mail: huangyingping@usst.edu.cn

Received Date 30 October 2021

Revised Date 13 December 2021

Published Date 30 December 2021

Abstract

Abstract

Road detection is the premise of vehicle automatic driving. In recent years, multi-modal data fusion based on deep learning has become a hot spot in the research of automatic driving. In this paper, convolutional neural network is used to fuse LiDAR point cloud and image data to realize road segmentation in traffic scenes. In this paper, a variety of fusion schemes at pixel level, feature level and decision level are proposed. Especially, four cross-fusion schemes are designed in feature level fusion. Various schemes are compared, and the best fusion scheme is given. In the network architecture, the semantic segmentation convolutional neural network with encoding and decoding structure is used as the basic network to cross-fuse the point cloud normal features and RGB image features at different levels. The fused data is restored by the decoder, and finally the detection results are obtained by using the activation function. The substantial experiments have been conducted on public KITTI data set to evaluate the performance of various fusion schemes. The results show that the fusion scheme E proposed in this paper has the best segmentation performance. Compared with other road-detection methods, our method gives better overall performance.
- autonomous driving /
- road detection /
- semantic segmentation /
- data fusion

FullText(HTML)

References

[1]	Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015. Google Scholar
[2]	Zhou Z W, Siddiquee M R, Tajbakhsh N, et al. UNet++: a nested U-Net architecture for medical image segmentation[C]//Proceedings of the 4th International Workshop on Deep Learning in Medical Image Analysis, 2018. Google Scholar
[3]	Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017. Google Scholar
[4]	Xiao X, Lian S, Luo Z M, et al. Weighted Res-UNet for high-quality retina vessel segmentation[C]//Proceedings of the 2018 9th International Conference on Information Technology in Medicine and Education, 2018. Google Scholar
[5]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016. Google Scholar
[6]	Chen L C, Zhu Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision, 2018. Google Scholar
[7]	Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(12): 2481–2495. doi: 10.1109/TPAMI.2016.2644615 CrossRef Google Scholar
[8]	Zhang S C, Zhang Z, Sun L B, et al. One for all: a mutual enhancement method for object detection and semantic segmentation[J]. Appl Sci, 2020, 10(1): 13. Google Scholar
[9]	Teichmann M, Weber M, Zöllner M, et al. MultiNet: real-time joint semantic reasoning for autonomous driving[C]//Proceedings of 2018 IEEE Intelligent Vehicles Symposium, 2018. Google Scholar
[10]	Chen Z, Chen Z J. RBNet: a deep neural network for unified road and road boundary detection[C]//Proceedings of the 24th International Conference on Neural Information Processing, 2017. Google Scholar
[11]	Oeljeklaus M, Hoffmann F, Bertram T. A fast multi-task CNN for spatial understanding of traffic scenes[C]//Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems, 2018. Google Scholar
[12]	Schlosser J, Chow C K, Kira Z. Fusing LIDAR and images for pedestrian detection using convolutional neural networks[C]//Proceedings of 2016 IEEE International Conference on Robotics and Automation, 2016. Google Scholar
[13]	Caltagirone L, Bellone M, Svensson L, et al. LIDAR–camera fusion for road detection using fully convolutional neural networks[J]. Rob Auton Syst, 2019, 111: 125–131. doi: 10.1016/j.robot.2018.11.002 CrossRef Google Scholar
[14]	Chen Z, Zhang J, Tao D C. Progressive LiDAR adaptation for road detection[J]. IEEE/CAA J Automat Sin, 2019, 6(3): 693–702. doi: 10.1109/JAS.2019.1911459 CrossRef Google Scholar
[15]	van Gansbeke W, Neven D, de Brabandere B, et al. Sparse and noisy LiDAR completion with RGB guidance and uncertainty[C]//Proceedings of the 2019 16th International Conference on Machine Vision Applications, 2019. Google Scholar
[16]	Wang T H, Hu H N, Lin C H, et al. 3D LiDAR and stereo fusion using stereo matching network with conditional cost volume normalization[C]//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019. Google Scholar
[17]	Zhang Y D, Funkhouser T. Deep depth completion of a single RGB-D image[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. Google Scholar
[18]	邓广晖. 基于卷积神经网络的RGB-D图像物体检测和语义分割[D]. 北京: 北京工业大学, 2017. Google Scholar Deng G H. Object detection and semantic segmentation for RGB-D images with convolutional neural networks[D]. Beijing: Beijing University of Technology, 2017. Google Scholar
[19]	曹培. 面向自动驾驶的双传感器信息融合目标检测及姿态估计[D]. 哈尔滨: 哈尔滨工业大学, 2019. Google Scholar Cao P. Dual sensor information fusion for target detection and attitude estimation in autonomous driving[D]. Harbin: Harbin Institute of Technology, 2019. Google Scholar
[20]	Fan R, Wang H L, Cai P D, et al. SNE-RoadSeg: incorporating surface normal information into semantic segmentation for accurate freespace detection[C]//Proceedings of the 16th European Conference on Computer Vision, 2020. Google Scholar
[21]	Fan R, Wang H L, Xue B H, et al. Three-filters-to-normal: an accurate and ultrafast surface normal estimator[J]. IEEE Rob Automat Lett, 2021, 6(3): 5405–5412. doi: 10.1109/LRA.2021.3067308 CrossRef Google Scholar

Overview

Overview

Overview: Road detection is an important content of environmental identification in the field of automatic driving, and it is an important prerequisite for vehicles to realize automatic driving. Multi-source data fusion based on deep learning has become a hot topic in the field of automatic driving. RGB data can provide dense texture and color information, LiDAR data can provide accurate spatial information, and multi-sensor data fusion can improve the robustness and accuracy of detection. The latest fusion method uses convolutional neural network (CNN) as a fusion tool to fuse the LiDAR data and RGB image data, and semantic segmentation to realize road detection and segmentation. In this paper, different fusion methods of LiDAR point cloud and image data are adopted by encoder-decoder structure to realize road segmentation in traffic scenes. Aiming at the fusion methods of point cloud and image data, this paper proposes a variety of fusion schemes at pixel level, feature level, and decision level. In particular, four kinds of cross-fusion schemes are designed in feature level fusion. Various schemes are compared and studied to give the best fusion scheme. As for the network architecture, we use the encoder with residual network and the decoder with dense connection and jump connection as the basic network. The input image is RGB-D, and the LiDAR depth map is processed into a normal map by a surface normal estimator. The normal map features and RGB image features are fused at different levels of the network. The features are extracted through two input signals generated by two encoders, restored by a decoder, and finally road detection results are obtained by using sigmoid activation function. KITTI data set is used to verify the performances of various fusion schemes. The contrast experiments show that the proposed fusion scheme E can better learn the LiDAR point cloud information, the camera image information, the correlation of cross added point cloud, and image information. Also, it can reduce the loss of characteristic information, and thus has the best road segmentation effect. Through quantitative analysis of the average accuracy (AP) of different road detection methods, the optimal fusion method proposed in this paper shows the advantages of average detection accuracy, and has good overall performance. Through qualitative analysis of the performance of different detection methods in different scenarios, the results show that the fusion scheme E proposed in this paper has good detection results for the boundary area between vehicles and roads, and could effectively reduce the false detection rate of road detection.