Citation: | Xu S J, Jing Y, Li H T, et al. Progressive multi-granularity ResNet vehicle recognition network[J]. Opto-Electron Eng, 2023, 50(7): 230052. doi: 10.12086/oee.2023.230052 |
[1] | Bay H, Tuytelaars T, Van Gool L. SURF: speeded up robust features[C]//Proceedings of the 9th European Conference on Computer Vision, 2006: 404–417. https://doi.org/10.1007/11744023_32. |
[2] | Csurka G, Dance C R, Fan L X, et al. Visual categorization with bags of keypoints[C]//Workshop on Statistical Learning in Computer Vision, Prague, 2004. |
[3] | De Sousa Matos F M, De Souza R M C R. An image vehicle classification method based on edge and PCA applied to blocks[C]//International Conference on Systems, Man, and Cybernetics, 2012: 1688–1693. https://doi.org/10.1109/ICSMC.2012.6377980. |
[4] | Behley J, Steinhage V, Cremers A B. Laser-based segment classification using a mixture of bag-of-words[C]//2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013: 4195–4200. https://doi.org/10.1109/IROS.2013.6696957. |
[5] | Liao L, Hu R M, Xiao J, et al. Exploiting effects of parts in fine-grained categorization of vehicles[C]//Proceedings of the 2015 IEEE International Conference on Image Processing, 2015: 745–749. https://doi.org/10.1109/ICIP.2015.7350898. |
[6] | Hsieh J W, Chen L C, Chen D Y. Symmetrical SURF and its applications to vehicle detection and vehicle make and model recognition[J]. IEEE Trans Intell Transp Syst, 2014, 15(1): 6−20. doi: 10.1109/TITS.2013.2294646 |
[7] | 冯建周, 马祥聪. 基于迁移学习的细粒度实体分类方法的研究[J]. 自动化学报, 2020, 46(8): 1759−1766. doi: 10.16383/j.ass.c190041 Feng J Z, Ma X C. Fine-grained entity type classification based on transfer learning[J]. Acta Autom Sin, 2020, 46(8): 1759−1766. doi: 10.16383/j.ass.c190041 |
[8] | 罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报, 2017, 43(8): 1306−1318. doi: 10.16383/j.aas.2017.c160425 Luo J H, Wu J X. A survey on fine-grained image categorization using deep convolutional features[J]. Acta Autom Sin, 2017, 43(8): 1306−1318. doi: 10.16383/j.aas.2017.c160425 |
[9] | 汪荣贵, 姚旭晨, 杨娟, 等. 基于深度迁移学习的微型细粒度图像分类[J]. 光电工程, 2019, 46(6): 180416. doi: 10.12086/oee.2019.180416 Wang R G, Yao X C, Yang J, et al. Deep transfer learning for fine-grained categorization on micro datasets[J]. Opto-Electron Eng, 2019, 46(6): 180416. doi: 10.12086/oee.2019.180416 |
[10] | Wei X S, Song Y Z, Aodha O M, et al. Fine-grained image analysis with deep learning: a survey[J]. IEEE Trans Pattern Anal Mach Intell, 2022, 44(12): 8927−8948. doi: 10.1109/TPAMI.2021.3126648 |
[11] | Yang Z, Luo T G, Wang D, et al. Learning to navigate for fine-grained classification[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 438–454. https://doi.org/10.1007/978-3-030-01264-9_26. |
[12] | Fang J, Zhou Y, Yu Y, et al. Fine-grained vehicle model recognition using a coarse-to-fine convolutional neural network architecture[J]. IEEE Trans Intell Transp Systems, 2017, 18(7): 1782−1792. doi: 10.1109/TITS.2016.2620495 |
[13] | Zhang X P, Xiong H K, Zhou W G, et al. Fused one-vs-all features with semantic alignments for fine-grained visual categorization[J]. IEEE Trans Image Process, 2016, 25(2): 878−892. doi: 10.1109/TIP.2015.2509425 |
[14] | Xu H P, Qi G L, Li J J, et al. Fine-grained image classification by visual-semantic embedding[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018: 1043–1049. https://doi.org/10.5555/3304415.3304563. |
[15] | Zhang H, Xu T, Elhoseiny M, et al. SPDA-CNN: Unifying semantic part detection and abstraction for fine-grained recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 1143–1152. https://doi.org/10.1109/CVPR.2016.129. |
[16] | Ding Y F, Ma Z Y, Wen S G, et al. AP-CNN: weakly supervised attention pyramid convolutional neural network for fine-grained visual classification[J]. IEEE Trans Image Process, 2021, 30: 2826−2836. doi: 10.1109/TIP.2021.3055617 |
[17] | Rao Y M, Chen G Y, Lu J W, et al. Counterfactual attention learning for fine-grained visual categorization and re-identification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 1005–1014. https://doi.org/10.1109/ICCV48922.2021.00106. |
[18] | Hu T, Qi H G, Huang Q M, et al. See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification[Z]. arXiv: 1901.09891, 2019. https://doi.org/10.48550/arXiv.1901.09891. |
[19] | Lin T Y, RoyChowdhury A, Maji S. Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015: 1449–1457. https://doi.org/10.1109/ICCV.2015.170. |
[20] | Yu C J, Zhao X Y, Zheng Q, et al. Hierarchical bilinear pooling for fine-grained visual recognition[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 595–610. https://doi.org/10.1007/978-3-030-01270-0_35. |
[21] | Gao Y, Beijbom O, Zhang N, et al. Compact bilinear pooling[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 317–326. https://doi.org/10.1109/CVPR.2016.41. |
[22] | Kong S, Fowlkes C. Low-rank bilinear pooling for fine-grained classification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7025–7034. https://doi.org/10.1109/CVPR.2017.743. |
[23] | Sun M, Yuan Y C, Zhou F, et al. Multi-attention multi-class constraint for fine-grained image recognition[C]//15th European Conference on Computer Vision, 2018: 834–850. https://doi.org/10.1007/978-3-030-01270-0_49. |
[24] | Zheng X W, Ji R R, Sun X S, et al. Towards optimal fine grained retrieval via decorrelated centralized loss with normalize-scale layer[C]//Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, 2019: 1140. https://doi.org/10.1609/aaai.v33i01.33019291. |
[25] | He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778. https://doi.org/10.1109/CVPR.2016.90. |
[26] | Du R Y, Cheng D L, Bhunia A K, et al. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches[C]//16th European Conference on Computer Vision, 2020: 153–168. https://doi.org/10.1007/978-3-030-58565-5_10. |
[27] | Choe J, Shim H. Attention-based dropout layer for weakly supervised object localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2214–2223. https://doi.org/10.1109/CVPR.2019.00232. |
[28] | Krause J, Stark J, Deng L, et al. 3D object representations for fine-grained categorization[C]//2013 IEEE International Conference on Computer Vision Workshops, 2013: 554–561. https://doi.org/10.1109/ICCVW.2013.77. |
[29] | Yang L J, Luo P, Loy C C, et al. A large-scale car dataset for fine-grained categorization and verification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3973–3981. https://doi.org/10.1109/CVPR.2015.7299023. |
[30] | Ali M, Tahir M A, Durrani M N. Vehicle images dataset for make and model recognition[J]. Data Brief, 2022, 42: 108107. doi: 10.1016/J.DIB.2022.108107 |
[31] | Song J W, Yang R Y. Feature boosting, suppression, and diversification for fine-grained visual classification[C]//International Joint Conference on Neural Networks, 2021: 1–8. https://doi.org/10.1109/IJCNN52387.2021.9534004. |
[32] | Zhou M H, Bai Y L, Zhang W, et al. Look-into-object: self-supervised structure modeling for object recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11771–11780. https://doi.org/10.1109/CVPR42600.2020.01179. |
[33] | Chen Y, Bai Y L, Zhang W, et al. Destruction and construction learning for fine-grained image recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5152–5161. https://doi.org/10.1109/CVPR.2019.00530. |
[34] | Luo W, Yang X T, Mo X J, et al. Cross-x learning for fine-grained visual categorization[C]//IEEE/CVF International Conference on Computer Vision, 2019: 8241–8250. https://doi.org/10.1109/ICCV.2019.00833. |
[35] | Guo C Y, Xie J Y, Liang K M, et al. Cross-layer navigation convolutional neural network for fine-grained visual classification[C]//ACM Multimedia Asia, 2021: 49. https://doi.org/10.1145/3469877.3490579. |
Model recognition aims to identify specific information such as the brand, model, and year of the vehicle, which can help verify the accuracy of tracking vehicle information. There are two research strategies for model recognition tasks. The strategy of strong supervision and learning involves utilizing image-level labeling information as well as additional bounding boxes in the model, component information, etc. Based on the strategy of weak supervision and learning, only the image-level label can be completely classified by fine particle size models. Most classification methods for weak supervision and learning adopt strategies such as attention mechanisms, dual-linear convolutional neural networks, and measurement learning. Pay more attention to the significant particle size of the vehicle's grid, tire tires, and other large granularity, and ignore the characteristics of small-size vehicle characteristics with distinguishing power such as car logo and door handles. Aiming at the difficulty of the vehicle due to the imaging differences such as posture and perspective, it is difficult to identify the model and propose a variety of multi-granular ResNet model recognition networks. First of all, using the ResNet network as the main network, propose a gradual multi-granular local convolution module to perform local convolution operations on vehicle images of different particle sizes, so that the network can be paid attention to the local characteristics of different particle-level vehicles when restructuring. Use the random channel discarding module to discard the multi-scale local feature map for random channel discarding, inhibit the network's attention to the characteristics of the vehicle's significant regional characteristics, and increase the attention of non-significant characteristics. Each training step is added to the classification loss. By dividing the network training process into different stages, the network can effectively integrate the multi-size features of the vehicle withdrawal, and guide the network extraction of multi-scale characteristics of vehicles with more discerning and diverse vehicles. The experimental results show that on the Stanford Cars dataset, the Compcars network dataset, and the model data set in the real scene, the accuracy of the network recognition accuracy has reached 95.7%, 98.8%, and 97.4%, respectively. Compared with the comparison network, the proposed network not only has the accuracy of recognition but also has better robustness. It has achieved very outstanding results in real scenes such as low light intensity and deformation of vehicles. The effectiveness of the model recognition on the road.
Overall structure of the proposed network
Progressive multi-granularity Local Convolution Block
Random channel drop block schematic diagram
Progressive multi-granularity training block schematic diagram
Top1/% curve of change. (a) Effect of β values on RCDB on Stanford-cars; (b) Effect of β values on RCDB on Compcars; (c) Effect of β values on RCDB on VMRURS
Network training and testing process
Visual comparison of vehicle recognition in each stage
Visual comparison of after adding each module
Visual comparison of different network vehicle recognition