Citation: | Peng H, Wang W Q, Chen L, et al. Few-shot object detection via online inferential calibration[J]. Opto-Electron Eng, 2023, 50(1): 220180. doi: 10.12086/oee.2023.220180 |
[1] | 陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 Chen X, Peng D L, Gu Y. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electron Eng, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 |
[2] | Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580−587. https://doi.org/10.1109/CVPR.2014.81. |
[3] | He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans Pattern Anal Mach Intell, 2015, 37(9): 1904−1916. doi: 10.1109/TPAMI.2015.2389824 |
[4] | Girshick R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169. |
[5] | Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015, 91–99. |
[6] | Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779–788. https://doi.org/10.1109/CVPR.2016.91. |
[7] | Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//14th European Conference on Computer Vision, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2. |
[8] | Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517–6525. https://doi.org/10.1109/CVPR.2017.690. |
[9] | Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 2999−3007. https://doi.org/10.1109/ICCV.2017.324. |
[10] | Redmon J, Farhadi A. YOLOv3: an incremental improvement[Z]. arXiv: 1804.02767, 2018. https://arxiv.org/abs/1804.02767. |
[11] | Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection[Z]. arXiv: 2004.10934, 2020. https://arxiv.org/abs/2004.10934. |
[12] | 马梁, 苟于涛, 雷涛, 等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363 Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363 |
[13] | Bennequin E. Meta-learning algorithms for few-shot computer vision[Z]. arXiv: 1909.13579, 2019. https://arxiv.org/abs/1909.13579. |
[14] | Behl H S, Baydin A G, Torr P H S. Alpha MAML: adaptive model-agnostic meta-learning[Z]. arXiv: 1905.07435, 2019. https://arxiv.org/abs/1905.07435. |
[15] | Yan X P, Chen Z L, Xu A N, et al. Meta R-CNN: towards general solver for instance-level low-shot learning[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019: 9576–9585. https://doi.org/10.1109/ICCV.2019.00967. |
[16] | Wang Y Q, Yao Q M. Few-shot learning: a survey[Z]. arXiv: 1904.05046v1, 2019. https://arxiv.org/abs/1904.05046v1. |
[17] | Duan Y, Andrychowicz M, Stadie B, et al. One-shot imitation learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 1087–1098. |
[18] | Zhang W L, Wang Y X. Hallucination improves few-shot object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13003–13012. https://doi.org/10.1109/CVPR46437.2021.01281. |
[19] | Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//2017 IEEE International Conference on Computer Vision, 2017: 2242–2251. https://doi.org/10.1109/ICCV.2017.244. |
[20] | Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 3672–2680. |
[21] | Li K, Zhang Y L, Li K P, et al. Adversarial feature hallucination networks for few-shot learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 13467–13476. https://doi.org/10.1109/CVPR42600.2020.01348. |
[22] | Hui B Y, Zhu P F, Hu Q H, et al. Self-attention relation network for few-shot learning[C]//2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2019: 198–203. https://doi.org/10.1109/ICMEW.2019.00041. |
[23] | Hao F S, Cheng J, Wang L, et al. Instance-level embedding adaptation for few-shot learning[J]. IEEE Access, 2019, 7: 100501−100511. doi: 10.1109/ACCESS.2019.2906665 |
[24] | Schönfeld E, Ebrahimi S, Sinha S, et al. Generalized zero-and few-shot learning via aligned variational autoencoders[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 8239–8247. https://doi.org/10.1109/CVPR.2019.00844. |
[25] | Sun B, Li B H, Cai S C, et al. FSCE: few-shot object detection via contrastive proposal encoding[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7348–7358. https://doi.org/10.1109/CVPR46437.2021.00727. |
[26] | Chen H, Wang Y L, Wang G Y, et al. LSTD: a low-shot transfer detector for object detection[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, 2018: 346. |
[27] | Hu H Z, Bai S, Li A X, et al. Dense relation distillation with context-aware aggregation for few-shot object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10180–10189. https://doi.org/10.1109/CVPR46437.2021.01005. |
[28] | Karlinsky L, Shtok J, Harary S, et al. RepMet: representative-based metric learning for classification and few-shot object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5197–5206. https://doi.org/10.1109/CVPR.2019.00534. |
[29] | Jiang W, Huang K, Geng J, et al. Multi-scale metric learning for few-shot learning[J]. IEEE Trans Circuits Syst Video Technol, 2021, 31(3): 1091−1102. doi: 10.1109/TCSVT.2020.2995754 |
[30] | Sung F, Yang Y X, Zhang L, et al. Learning to compare: relation network for few-shot learning[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 1199–1208. https://doi.org/10.1109/CVPR.2018.00131. |
[31] | Tao X Y, Hong X P, Chang X Y, et al. Few-shot class-incremental learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 12180–12189. . |
[32] | Wang Y, Wu X M, Li Q M, et al. Large margin few-shot learning[Z]. arXiv: 1807.02872, 2018. https://doi.org/10.48550/arXiv.1807.02872. |
[33] | Agarwal A, Majee A, Subramanian A, et al. Attention guided cosine margin to overcome class-imbalance in few-shot road object detection[C]//2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 2022: 221–230. https://doi.org/10.1109/WACVW54805.2022.00028. |
[34] | Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms[Z]. arXiv: 1803.02999, 2018. https://arxiv.org/abs/1803.02999. |
[35] | Li Z G, Zhou F W, Chen F, et al. Meta-SGD: learning to learn quickly for few-shot learning[Z]. arXiv: 1707.09835, 2017. https://arxiv.org/abs/1707.09835. |
[36] | Ravi S, Larochelle H. Optimization as a model for few-shot learning[C]//5th International Conference on Learning Representations, 2016. |
[37] | Kang B Y, Liu Z, Wang X, et al. Few-shot object detection via feature reweighting[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 8419–8428. https://doi.org/10.1109/ICCV.2019.00851. |
[38] | Zhang G J, Luo Z P, Cui K W, et al. Meta-DETR: image-level few-shot detection with inter-class correlation exploitation[J]. IEEE Trans Pattern Anal Mach Intell, 2022. https://doi.org/10.1109/TPAMI.2022.3195735. |
[39] | 马雯, 于炯, 王潇, 等. 基于改进Faster R-CNN的垃圾检测与分类方法[J]. 计算机工程, 2021, 47(8): 294−300. doi: 10.19678/j.issn.1000-3428.0058258 Ma W, Yu J, Wang X, et al. Garbage detection and classification method based on improved faster R-CNN[J]. Comput Eng, 2021, 47(8): 294−300. doi: 10.19678/j.issn.1000-3428.0058258 |
[40] | Wang Y X, Ramanan D, Hebert M. Meta-learning to detect rare objects[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 9924–9933. https://doi.org/10.1109/ICCV.2019.01002. |
[41] | Li A X, Li Z G. Transformation invariant few-shot object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 3093–3101. https://doi.org/10.1109/CVPR46437.2021.00311. |
[42] | Xiao Y, Marlet R. Few-shot object detection and viewpoint estimation for objects in the wild[C]//16th European Conference on Computer Vision, 2020: 192−210. https://doi.org/10.1007/978-3-030-58520-4_12. |
[43] | Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936−944. https://doi.org/10.1109/CVPR.2017.106. |
[44] | Wang X, Huang T, Gonzalez J, et al. Frustratingly simple few-shot object detection[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 9919–9928. |
[45] | Fan Z B, Ma Y C, Li Z M, et al. Generalized few-shot object detection without forgetting[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4525−4534. https://doi.org/10.1109/CVPR46437.2021.00450. |
[46] | Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//14th European Conference on Computer Vision, 2016: 850–865. https://doi.org/10.1007/978-3-319-48881-3_56. |
[47] | Li B, Yan J J, Wu W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8971−8980. https://doi.org/10.1109/CVPR.2018.00935. |
[48] | Zhu Z, Wang Q, Li B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 103–119. https://doi.org/10.1007/978-3-030-01240-3_7. |
[49] | 赵春梅, 陈忠碧, 张建林. 基于卷积网络的目标跟踪应用研究[J]. 光电工程, 2020, 47(1): 180668. doi: 10.12086/oee.2020.180668 Zhao C M, Chen Z B, Zhang J L. Research on target tracking based on convolutional networks[J]. Opto-Electron Eng, 2020, 47(1): 180668. doi: 10.12086/oee.2020.180668 |
[50] | 赵春梅, 陈忠碧, 张建林. 基于深度学习的飞机目标跟踪应用研究[J]. 光电工程, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261 Zhao C M, Chen Z B, Zhang J L. Application of aircraft target tracking based on deep learning[J]. Opto-Electron Eng, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261 |
[51] | Wu J X, Liu S T, Huang D, et al. Multi-scale positive sample refinement for few-shot object detection[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 456–472. https://doi.org/10.1007/978-3-030-58517-4_27. |
Faster R-CNN network architecture
FSOIC network architecture
Detection results based on TFA
Attention-FPN network architecture
Channel attention module
FSOIC algorithm class template generation module
Feature metric space
Performance comparison of the detection results
Detection results under the occlusion conditions in the 10 shot task
10 shot task detection results. (a) Detection results of the Faster R-CNN network based on TFA; (b) Detection results of the Faster R-CNN net work using the online inference calibration module; (c) Detection results of the Faster R-CNN network using the online inference calibration module and adding the Attention-FPN network