Sparse feature image classification network with spatial position correction

Jiang Wentao; Chen Chen; Zhang Shengchong

doi:10.12086/oee.2024.240050

Article navigation > Opto-Electronic Engineering > 2024 Vol. 51 > No. 5 > 240050

Next Article Previous Article

Jiang W T, Chen C, Zhang S C. Sparse feature image classification network with spatial position correction[J]. Opto-Electron Eng, 2024, 51(5): 240050. doi: 10.12086/oee.2024.240050

Citation:

Jiang W T, Chen C, Zhang S C. Sparse feature image classification network with spatial position correction[J]. Opto-Electron Eng, 2024, 51(5): 240050. doi: 10.12086/oee.2024.240050

Sparse feature image classification network with spatial position correction

1.
College of Software, Liaoning Technical University, Huludao, Liaoning 125105, China
2.
Key Laboratory of Optoelectronic Information Control and Security Technology, Tianjin 300308, China

Fund Project: Project supported by National Natural Science Foundation of China (61172144), Liaoning Provincial Natural Science Foundation of China (20170540426), and Key Fund of Liaoning Provincial Department of Education (LJYL049)

More Information

^*Corresponding author: 867428188@qq.com

Received Date 06 March 2024

Revised Date 23 April 2024

Accepted Date 24 April 2024

Published Date 25 May 2024

Abstract

Abstract

To sparse semantics and enhance attention to key features, enhance the correlation between spatial and local features, and constrain the spatial position of features, this paper proposes a sparse feature image classification network with spatial position correction (SSCNet) for spatial position correction. This network is based on the ResNet-34 residual network. Firstly, a sparse semantic enhanced feature (SSEF) module is proposed, which combines depthwise separable convolution (DSC) and SE to enhance feature extraction ability while maintaining the integrity of spatial information; Then, the spatial position correction symmetric attention mechanism (SPCS) is proposed. SPCS adds the symmetric global coordinate attention mechanism to specific positions in the network, which can strengthen the spatial relationships between features, constrain and correct the spatial positions of features, and enhance the network's perception of global detailed features; Finally, the average pooling module (APM) is proposed and applied to each residual branch of the network, enabling the network to more effectively capture global feature information, enhance feature translation invariance, delay network overfitting, and improve network generalization ability. In the CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewood datasets, SSCNet has shown varying degrees of improvement in classification accuracy compared to other high-performance networks, proving that SSCNet can better extract local detail information while balancing global information, with high classification accuracy and strong generalization performance.
- image classification /
- feature extraction /
- space position correction /
- sparse semantics /
- symmetric attention /
- global perception

FullText(HTML)

References

[1]	Yang H, Li J. Label contrastive learning for image classification[J]. Soft Comput, 2023, 27(18): 13477−13486. doi: 10.1007/s00500-022-07808-z CrossRef Google Scholar
[2]	Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015: 1–9. https://doi.org/10.1109/CVPR.2015.7298594. Google Scholar
[3]	Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proc IEEE, 1998, 86(11): 2278−2324. doi: 10.1109/5.726791 CrossRef Google Scholar
[4]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Commun ACM, 2017, 60(6): 84−90. doi: 10.1145/3065386 CrossRef Google Scholar
[5]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 770–778. https://doi.org/10.1109/CVPR.2016.90. Google Scholar
[6]	徐胜军, 荆扬, 李海涛, 等. 渐进式多粒度ResNet车型识别网络[J]. 光电工程, 2023, 50(7): 230052. doi: 10.12086/oee.2023.230052 CrossRef Google Scholar Xu S J, Jing Y, Li H T, et al. Progressive multi-granularity ResNet vehicle recognition network[J]. Opto-Electron Eng, 2023, 50(7): 230052. doi: 10.12086/oee.2023.230052 CrossRef Google Scholar
[7]	Wang J, Yang Q P, Yang S Q, et al. Dual-path processing network for high-resolution salient object detection[J]. Appl Intell, 2022, 52(10): 12034−12048. doi: 10.1007/s10489-021-02971-6 CrossRef Google Scholar
[8]	Xue T, Hong Y. IX-ResNet: fragmented multi-scale feature fusion for image classification[J]. Multimed Tools Appl, 2021, 80(18): 27855−27865. doi: 10.1007/s11042-021-10893-1 CrossRef Google Scholar
[9]	Jiang Z W, Ma Z J, Wang Y N, et al. Aggregated decentralized down-sampling-based ResNet for smart healthcare systems[J]. Neural Comput Appl, 2023, 35(20): 14653−14665. doi: 10.1007/s00521-021-06234-w CrossRef Google Scholar
[10]	Luo Z B, Sun Z T, Zhou W L, et al. Rethinking ResNets: improved stacking strategies with high-order schemes for image classification[J]. Complex Intell Syst, 2022, 8(4): 3395−3407. doi: 10.1007/S40747-022-00671-3 CrossRef Google Scholar
[11]	Jafar A, Lee M. High-speed hyperparameter optimization for deep ResNet models in image recognition[J]. Cluster Comput, 2023, 26(5): 2605−2613. doi: 10.1007/s10586-021-03284-6 CrossRef Google Scholar
[12]	陈龙, 张建林, 彭昊, 等. 多尺度注意力与领域自适应的小样本图像识别[J]. 光电工程, 2023, 50(4): 220232. doi: 10.12086/oee.2023.220232 CrossRef Google Scholar Chen L, Zhang J L, Peng H, et al. Few-shot image classification via multi-scale attention and domain adaptation[J]. Opto-Electron Eng, 2023, 50(4): 220232. doi: 10.12086/oee.2023.220232 CrossRef Google Scholar
[13]	梁礼明, 金家新, 冯耀, 等. 融合坐标感知与混合提取的视网膜病变分级算法[J]. 光电工程, 2024, 51(1): 230276. doi: 10.12086/oee.2024.230276 CrossRef Google Scholar Liang L M, Jin J X, Feng Y, et al. Retinal lesions graded algorithm that integrates coordinate perception and hybrid extraction[J]. Opto-Electron Eng, 2024, 51(1): 230276. doi: 10.12086/oee.2024.230276 CrossRef Google Scholar
[14]	叶宇超, 陈莹. 跨尺度注意力融合的单幅图像去雨[J]. 光电工程, 2023, 50(10): 230191. doi: 10.12086/oee.2023.230191 CrossRef Google Scholar Ye Y C, Chen Y. Single image rain removal based on cross scale attention fusion[J]. Opto-Electron Eng, 2023, 50(10): 230191. doi: 10.12086/oee.2023.230191 CrossRef Google Scholar
[15]	Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 7132–7141. https://doi.org/10.1109/CVPR.2018.00745. Google Scholar
[16]	Ying Y, Zhang N B, Shan P, et al. PSigmoid: improving squeeze-and-excitation block with parametric sigmoid[J]. Appl Intell, 2021, 51(10): 7427−7439. doi: 10.1007/s10489-021-02247-z CrossRef Google Scholar
[17]	Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 13708–13717. https://doi.org/10.1109/CVPR46437.2021.01350. Google Scholar
[18]	Ji Q W, Yu B, Yang Z W, et al. LAM: lightweight attention module[C]//15th International Conference on Knowledge Science, Engineering and Management, Singapore, 2022: 485–497. https://doi.org/10.1007/978-3-031-10986-7_39. Google Scholar
[19]	Zhong H M, Han T T, Xia W, et al. Research on real-time teachers’ facial expression recognition based on YOLOv5 and attention mechanisms[J]. EURASIP J Adv Signal Process, 2023, 2023(1): 55. doi: 10.1186/s13634-023-01019-w CrossRef Google Scholar
[20]	Qi F, Wang Y L, Tang Z. Lightweight plant disease classification combining GrabCut algorithm, new coordinate attention, and channel pruning[J]. Neural Process Lett, 2022, 54(6): 5317−5331. doi: 10.1007/s11063-022-10863-0 CrossRef Google Scholar
[21]	Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//32nd International Conference on Machine Learning, Lille, 2015: 448–456. Google Scholar
[22]	Abdi M, Nahavandi S. Multi-residual networks: improving the speed and accuracy of residual networks[Z]. arXiv: 1609.05672, 2016. https://doi.org/10.48550/arXiv.1609.05672. Google Scholar
[23]	Ma C X, Wu J B, Si C Y, et al. Scaling supervised local learning with augmented auxiliary networks[Z]. arXiv: 2402.17318, 2024. https://doi.org/10.48550/arXiv.2402.17318. Google Scholar
[24]	Wu X D, Gao S Q, Zhang Z Y, et al. Auto-train-once: controller network guided automatic network pruning from scratch[Z]. arXiv: 2403.14729, 2024. https://doi.org/10.48550/arXiv.2403.14729. Google Scholar
[25]	Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 2261–2269. https://doi.org/10.1109/CVPR.2017.243. Google Scholar
[26]	Lan H, Wang X H, Shen H, et al. Couplformer: rethinking vision transformer with coupling attention[C]//Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, 2023: 6464–6473. https://doi.org/10.1109/WACV56688.2023.00641. Google Scholar
[27]	Zhou C L, Zhang H, Zhou Z K, et al. QKFormer: hierarchical spiking transformer using Q-K attention[Z]. arXiv: 2403.16552, 2024. https://doi.org/10.48550/arXiv.2403.16552. Google Scholar
[28]	Shin H, Choi D W. Teacher as a lenient expert: teacher-agnostic data-free knowledge distillation[C]//Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, 2024: 14991–14999. https://doi.org/10.1609/aaai.v38i13.29420. Google Scholar
[29]	Tan M X, Le Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//36th International Conference on Machine Learning, Long Beach, 2019: 6105–6114. Google Scholar

Overview

Overview

To sparse semantics and enhance attention to key features, enhance the correlation between spatial and local features, and constrain the spatial position of features, this paper proposes a Sparse Feature Image Classification Network with Spatial Position Correction (SSCNet) for spatial position correction. Firstly, a Sparse Semantic Enhanced Feature Module (SSEF) module is proposed, which combines Depth Separable Convolution (DSC) and SE (Squeeze and Excitation) modules to enhance feature extraction ability while maintaining spatial information integrity; Then, the Spatial Position Correction Symmetric Attention Mechanism (SPCS) is proposed. SPCS adds the symmetric coordinate attention mechanism to specific positions in the network, which can strengthen the spatial relationships between features, constrain and correct their spatial positions, and enhance the network's perception of global detailed features; Finally, the Average Pooling Module (APM) is proposed and applied to each residual branch of the network, enabling the network to more effectively capture global feature information, enhance feature translation invariance, delay network overfitting, and improve network generalization ability. This article used CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewood datasets as experimental datasets. The CIFAR-10 dataset contains a total of 60000 color images from 10 categories, each with a resolution of 32×32 pixels. This dataset is commonly used to test and compare the performance of image classification algorithms. The CIFAR-100 dataset is more challenging and used to evaluate model performance for finer grained image classification tasks. The SVHN dataset contains real-world street view house number images, which contain digital images from Google Street View images used to recognize numbers on house signs. The images in the SVHN dataset are divided into training, testing, and additional training sets, each containing one or more numbers, and the resolution of the images is also higher than that of the CIFAR dataset. The Imagenette and Imagewof datasets are small scale subsets extracted from ImageNet, which have been streamlined and adjusted based on the ImageNet dataset. This article compares the network model with 12 other network models on 5 datasets. In the CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewood datasets, the classification accuracy of SSCNet is 96.72%, 80.63%, 97.43%, 88.75%, and 82.09%. Compared with other methods, SSCNet in this paper can better extract local detail information while balancing global information, and has higher classification accuracy and strong generalization performance.