Deep contour fragment matching algorithm for real-time instance segmentation

Cao Chunlin; Tao Chongben; Li Huayi; Gao Hanwen

doi:10.12086/oee.2021.210245

Article navigation > Opto-Electronic Engineering > 2021 Vol. 48 > No. 11 > 210245

Next Article Previous Article

Cao C L, Tao C B, Li H Y, et al. Deep contour fragment matching algorithm for real-time instance segmentation[J]. Opto-Electron Eng, 2021, 48(11): 210245. doi: 10.12086/oee.2021.210245

Citation:

Cao C L, Tao C B, Li H Y, et al. Deep contour fragment matching algorithm for real-time instance segmentation[J]. Opto-Electron Eng, 2021, 48(11): 210245. doi: 10.12086/oee.2021.210245

Deep contour fragment matching algorithm for real-time instance segmentation

1.
School of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
2.
Tsinghua University Suzhou Automotive Research Institute, Suzhou, Jiangsu 215134, China

Fund Project: National Natural Science Foundation of China (61801323, 61972454), China Postdoctoral Science Foundation (2021M691848), Science and Technology Projects Fund of Suzhou (SS2019029), and Natural Science Foundation of Jiangsu Province (BK20201405, 19KJB110021, 20KJB520018)

More Information

^*Corresponding author: Tao Chongben, E-mail: tom1tao@163.com

Received Date 20 July 2021

Revised Date 10 November 2021

Published Date 15 November 2021

Abstract

Abstract

During the instance segmentation for contour convergence, it is a general problem that target occlusion increases the time for contour processing and reduces the accuracy of the detection box. This paper proposes an algorithm for real-time instance segmentation, adding fragment matching, target aggregation loss function and boundary coefficient modules to the processing contour. Firstly, fragment matching is performed on the initial contour formed by evenly spaced points, and local ground truth points are allocated in each fragment to achieve a more natural, faster, and smoother deformation path. Secondly, the target aggregation loss function and the boundary coefficient modules are used to predict the objects in the presence of object occlusion and give an accurate detection box. Finally, circular convolution and Snake model are used to converge the matched contours, and then the vertices are iteratively calculated to obtain segmentation results. The proposed method is evaluated on multiple data sets such as Cityscapes, Kins, COCO, et al, among which 30.7 mAP and 33.1 f/s results are obtained on the COCO dataset, achieving a compromise between accuracy and speed.
- instance segmentation /
- object detection /
- snake model /
- object occlusion /
- initial contour

FullText(HTML)

References

[1]	Fazeli N, Oller M, Wu J, et al. See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion[J]. Sci Robot, 2019, 4(26): eaav3123. doi: 10.1126/scirobotics.aav3123 CrossRef Google Scholar
[2]	张颖, 杨晶, 杨玉峰. 雾对基于可见光的车辆定位性能的研究[J]. 光电工程, 2020, 47(4): 85–90 doi: 10.12086/oee.2020.190661 CrossRef Google Scholar Zhang Y, Yang J, Yang Y F. The research on fog's positioning performance of vehicles based on visible light[J]. Opto-Electron Eng, 2020, 47(4): 85–90 doi: 10.12086/oee.2020.190661 CrossRef Google Scholar
[3]	孟凡俊, 尹东. 基于神经网络的车辆识别代号识别方法[J]. 光电工程, 2021, 48(1): 51–60 doi: 10.12086/oee.2021.200094 CrossRef Google Scholar Meng F J, Yin D. Vehicle identification number recognition based on neural network[J]. Opto-Electron Eng, 2021, 48(1): 51–60 doi: 10.12086/oee.2021.200094 CrossRef Google Scholar
[4]	Ma W C, Wang S L, Hu R, et al. Deep rigid instance scene flow[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 3609–3617. Google Scholar
[5]	Cao J L, Cholakkal H, Anwer R M, et al. D2det: Towards high quality object detection and instance segmentation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020: 11482–11491. Google Scholar
[6]	He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980–2988. Google Scholar
[7]	Kass M, Witkin A, Terzopoulos D. Snakes: Active contour models[J]. Int J Computer Vis, 1988, 1(4): 321–331. doi: 10.1007/BF00133570 CrossRef Google Scholar
[8]	Peng S D, Jiang W, Pi H J, et al. Deep Snake for real-time instance segmentation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020: 8530–8539. Google Scholar
[9]	Zhou X Y, Wang D Q, Krähenbühl P. Objects as points[Z]. arXiv preprint arXiv: 1904.07850, 2019. Google Scholar
[10]	Wang X L, Xiao T T, Jiang Y N, et al. Repulsion loss: detecting pedestrians in a crowd[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7774–7783. Google Scholar
[11]	Neven D, Brabandere B D, Proesmans M, et al. Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 8829–8837. Google Scholar
[12]	Liu S, Qi L, Qin H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 8759–8768. Google Scholar
[13]	Follmann P, König R, Härtinger P, et al. Learning to see the invisible: End-to-end trainable amodal instance segmentation[C]//2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2019: 1328–1336. Google Scholar
[14]	Bolya D, Zhou C, Xiao F Y, et al. YOLACT: Real-time instance segmentation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 9156–9165. Google Scholar
[15]	Xu W Q, Wang H Y, Qi F B, et al. Explicit shape encoding for real-time instance segmentation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 5167–5176. Google Scholar
[16]	Jetley S, Sapienza M, Golodetz S, et al. Straight to shapes: real-time detection of encoded shapes[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 4207–4216. Google Scholar

Overview

Overview

Overview: With the help of instance segmentation, the scene information can be better understood, and the perception system of autonomous driving can be effectively improved. However, due to the problems such as object occlusion and object blur during detection, the accuracy of instance segmentation is greatly reduced. Deep neural network is a common method to solve object occlusion and blur. Based on computing resources and real-time considerations, contour-based algorithms are other solutions. Active Contour Model (ACM) is a classic contour algorithm, which is called Snake model. Its parameters are less than those based on dense pixels, which speeds up the segmentation. A novel segmentation algorithm based on ACM combined with cyclic convolution is proposed. The algorithm uses center net as the target detector to update the vertices using the iterative calculation of cyclic convolution and vertex offset calculation, and finally fits the real shape of the body. The algorithm has three main contributions. Firstly, for object occlusion and blurring, a loss function (target aggregation loss) is introduced, which increases the positioning accuracy of the detection box by pulling and repelling surrounding objects to the target. Secondly, the initial contour processing is an important step based on the contour algorithm, which affects the accuracy and speed of subsequent instance segmentation. This paper proposes a method of processing the initial contour, which is fragment matching. The initial contour to be processed is caused by evenly spaced points. The detection box is adaptively divided into multiple segments. The segments correspond to the initial contour. Each segment is matched point by point and assigned vertices. These vertices are the key to subsequent deformation. Finally, in dense scenes, it is easy to lose the information of adjacent objects in the same detection box. This paper proposes a boundary coefficient module to correct the misjudged boundary information by dividing the area and aligning the features to ensure the accuracy of boundary segmentation. The algorithm in this paper is compared with multiple advanced algorithms in multiple data sets. In the Cityscapes dataset, an AP_vol of 37.7% and an AP result of 31.8% are obtained, which is an improvement of 1.2% APvol compared to PANet. In SBD dataset, the results of 62.1% AP50 and 48.5% AP70 were obtained, indicating that even if the IoU threshold changes, the AP does not change much, which proves its stability. Compared with other real-time algorithms in the COCO dataset, a trade-off between accuracy and speed was achieved, reaching 33.1 f/s, while the COCO test-dev has 30.7% mAP. After the above data analysis, it is proved that the algorithm in this paper has reached a good level in accuracy and speed.