Smartphone image quality assessment method based on Swin-AK Transformer

Hou Guopeng; Dong Wu; Lu Likun; Zhou Ziyi; Ma Qian; Bai Zhen; Zheng Shenghui

doi:10.12086/oee.2025.240264

Article navigation > Opto-Electronic Engineering > 2025 Vol. 52 > No. 1 > 240264

Next Article Previous Article

Hou G P, Dong W, Lu L K, et al. Smartphone image quality assessment method based on Swin-AK Transformer[J]. Opto-Electron Eng, 2025, 52(1): 240264. doi: 10.12086/oee.2025.240264

Citation:

Hou G P, Dong W, Lu L K, et al. Smartphone image quality assessment method based on Swin-AK Transformer[J]. Opto-Electron Eng, 2025, 52(1): 240264. doi: 10.12086/oee.2025.240264

Smartphone image quality assessment method based on Swin-AK Transformer

Beijing Key Laboratory of Signal and Information Processing for High-end Printing Equipment, Beijing Institute of Graphic Communication, Beijing 102600, China

Fund Project: The Important Project of Digital Education Research of Beijing (BDEC2022619027), 2023 Project Proposal of Beijing Higher Education Association (MS2023168), the Research Project of Beijing Institute of Graphic Communication (Ec202303, Ea202301, E6202405), the Disciplinary Construction and Postgraduate Education Project of Beijing Institute of Graphic Communication (21090323009, 21090224002, 21090124013), Classification Development of Beijing Municipal Universities-Construction Project of Emerging Interdisciplinary Platform for Publishing at Beijing Institute of Graphic Communication-Key Technology Research and Development Platform for Digital Inkjet Printing Technology and Multifunctional Rotary Offset Press (04190123001/003), Open Foundation of State key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) (SKLNST-2023-1-12) , and the Project of the "Artificial Intelligence Plus" Course Construction of Beijing Institute of Graphic Communication

More Information

^*Corresponding author: dongwu@bigc.edu.cn
CSTR: 32245.14.oee.2025.240264

Received Date 11 November 2024

Revised Date 23 December 2024

Accepted Date 23 December 2024

Published Date 25 January 2025

Abstract

Abstract

This paper proposes a smartphone image quality assessment method that combines the Swin-AK Transformer based on alterable kernel convolution and manual features based on dual attention cross-fusion. Firstly, manual features that affected image quality were extracted. These features could capture subtle visual changes in images. Secondly, the Swin-AK Transformer was presented and it could improve the extraction and processing of local information. In addition, a dual attention cross-fusion module was designed, integrating spatial attention and channel attention mechanisms to fuse manual features with deep features. Experimental results show that the Pearson correlation coefficients on the SPAQ and LIVE-C datasets reached 0.932 and 0.885, respectively, while the Spearman rank-order correlation coefficients reached 0.929 and 0.858, respectively. These results demonstrate that the proposed method in this paper can effectively predict the quality of smartphone images.
- image quality assessment /
- smartphone image /
- Swin Transformer /
- manual features /
- spatial attention /
- channel attention

FullText(HTML)

References

[1]	鄢杰斌, 方玉明, 刘学林. 图像质量评价研究综述——从失真的角度[J]. 中国图象图形学报, 2022, 27 (5): 1430−1466. doi: 10.11834/jig.210790 CrossRef Google Scholar Yan J B, Fang Y M, Liu X L. The review of distortion-related image quality assessment[J]. J Image Graphics, 2022, 27 (5): 1430−1466. doi: 10.11834/jig.210790 CrossRef Google Scholar
[2]	Ke J J, Wang Q F, Wang Y L, et al. MUSIQ: multi-scale image quality transformer[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 5128–5137. https://doi.org/10.1109/ICCV48922.2021.00510. Google Scholar
[3]	Varga D. No-reference image quality assessment using the statistics of global and local image features[J]. Electronics, 2023, 12 (7): 1615. doi: 10.3390/electronics12071615 CrossRef Google Scholar
[4]	Jain P, Shikkenawis G, Mitra S K. Natural scene statistics and CNN based parallel network for image quality assessment[C]//2021 IEEE International Conference on Image Processing (ICIP), 2021: 1394–1398. https://doi.org/10.1109/ICIP42928.2021.9506404. Google Scholar
[5]	Shao X, Liu M Q, Li Z H, et al. CPDINet: blind image quality assessment via a content perception and distortion inference network[J]. IET Image Processing, 2022, 16 (7): 1973−1987. doi: 10.1049/ipr2.12463 CrossRef Google Scholar
[6]	Zhao K, Yuan K, Sun M, et al. Quality-aware pretrained models for blind image quality assessment[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023: 22302–22313. https://doi.org/10.1109/CVPR52729.2023.02136. Google Scholar
[7]	Fang Y M, Zhu H W, Zeng Y, et al. Perceptual quality assessment of smartphone photography[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 3674–3683. https://doi.org/10.1109/CVPR42600.2020.00373. Google Scholar
[8]	Yuan Z F, Qi Y, Hu M H, et al. Opinion-unaware no-reference image quality assessment of smartphone camera images based on aesthetics and human perception[C]//2020 IEEE International Conference on Multimedia & Expo Workshops, 2020: 1–6. https://doi.org/10.1109/ICMEW46912.2020.9106048. Google Scholar
[9]	Zhou Y W, Wang Y L, Kong Y Y, et al. Multi-Indicator image quality assessment of smartphone camera based on human subjective behavior and perception[C]//2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2020: 1–6. https://doi.org/10.1109/ICMEW46912.2020.9105971. Google Scholar
[10]	Huang C H, Wu J L. Multi-task deep CNN model for no-reference image quality assessment on smartphone camera photos[Z]. arXiv: 2008.11961, 2020. https://arxiv.org/abs/2008.11961. Google Scholar
[11]	Yao C, Lu Y R, Liu H, et al. Convolutional neural networks based on residual block for no-reference image quality assessment of smartphone camera images[C]//Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2020: 1–6. https://doi.org/10.1109/ICMEW46912.2020.9106034. Google Scholar
[12]	王斌, 白永强, 朱仲杰, 等. 联合空角信息的无参考光场图像质量评价[J]. 光电工程, 2024, 51 (9): 240139. doi: 10.12086/oee.2024.240139 CrossRef Google Scholar Wang B, Bai Y Q, Zhu Z J, et al. No-reference light field image quality assessment based on joint spatial-angular information[J]. Opto-Electron Eng, 2024, 51 (9): 240139. doi: 10.12086/oee.2024.240139 CrossRef Google Scholar
[13]	陈松, 温宇鑫, 安浩铭. 基于多尺度双边滤波Retinex的非均匀光照散斑图像矫正[J/OL]. 激光技术, 1-16 [2025-01-18]. Google Scholar http://kns.cnki.net/kcms/detail/51.1125.TN.20240116.1129.004.html. Google Scholar
[14]	刘佳, 唐鋆磊, 林冰, 等. 基于HSV (色相-饱和度-明度)与形状特征的涂层锈点图像识别[J]. 中国表面工程, 2023, 36 (4): 217−228. doi: 10.11933/j.issn.1007-9289.20221008001 CrossRef Google Scholar Liu J, Tang J L, Lin B, et al. Rust spot image recognition of coatings based on HSV and shape feature[J]. China Surf Eng, 2023, 36 (4): 217−228. doi: 10.11933/j.issn.1007-9289.20221008001 CrossRef Google Scholar
[15]	Liu Q G, Liu P, Wang Y H, et al. Semi-parametric decolorization with Laplacian-based perceptual quality metric[J]. IEEE Trans Circuits Syst Video Technol, 2017, 27 (9): 1856−1868. doi: 10.1109/tcsvt.2016.2555779 CrossRef Google Scholar
[16]	马常昊, 胡文惠, 钟海超, 等. 融合Sobel算子的SAR图像结构优化方法[J]. 探测与控制学报, 2024, 46 (2): 119−124. Google Scholar Ma C H, Hu W H, Zhong H C, et al. SAR image structure optimization method using Sobel operator fusion[J]. J Detect Control, 2024, 46 (2): 119−124. Google Scholar
[17]	罗小燕, 胡振, 汤文聪, 等. 傅里叶和LBP描述子相结合的矿石颗粒种类识别[J]. 传感器与微系统, 2023, 42 (11): 147−150. doi: 10.13873/J.1000-9787(2023)11-0147-04 CrossRef Google Scholar Luo X Y, Hu Z, Tang W C, et al. Species identification of ore particles combined with Fourier and LBP descriptors[J]. Transducer Microsyst Technol, 2023, 42 (11): 147−150. doi: 10.13873/J.1000-9787(2023)11-0147-04 CrossRef Google Scholar
[18]	Liu Z, Lin Y T, Cao Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986. Google Scholar
[19]	王高平, 李珣, 贾雪芳, 等. 融合Swin Transformer的立体匹配方法STransMNet[J]. 光电工程, 2023, 50 (4): 220246. doi: 10.12086/oee.2023.220246 CrossRef Google Scholar Wang G P, Li X, Jia X F, et al. STransMNet: a stereo matching method with Swin Transformer fusion[J]. Opto-Electron Eng, 2023, 50 (4): 220246. doi: 10.12086/oee.2023.220246 CrossRef Google Scholar
[20]	Ghadiyaram D, Bovik A C. Massive online crowdsourced study of subjective and objective picture quality[J]. IEEE Trans Image Process, 2016, 25 (1): 372−387. doi: 10.1109/TIP.2015.2500021 CrossRef Google Scholar
[21]	Saad M A, Bovik A C, Charrier C. Blind image quality assessment: a natural scene statistics approach in the DCT domain[J]. IEEE Trans Image Process, 2012, 21 (8): 3339−3352. doi: 10.1109/TIP.2012.2191563 CrossRef Google Scholar
[22]	Moorthy A K, Bovik A C. Blind image quality assessment: from natural scene statistics to perceptual quality[J]. IEEE Trans Image Process, 2011, 20 (12): 3350−3364. doi: 10.1109/TIP.2011.2147325 CrossRef Google Scholar
[23]	Mittal A, Moorthy A K, Bovik A C. No-reference image quality assessment in the spatial domain[J]. IEEE Trans Image Process, 2012, 21 (12): 4695−4708. doi: 10.1109/TIP.2012.2214050 CrossRef Google Scholar
[24]	Ye P, Kumar J, Kang L, et al. Unsupervised feature learning framework for no-reference image quality assessment[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 1098–1105. https://doi.org/10.1109/CVPR.2012.6247789. Google Scholar
[25]	Zhang L, Zhang L, Bovik A C. A feature-enriched completely blind image quality evaluator[J]. IEEE Trans Image Process, 2015, 24 (8): 2579−2591. doi: 10.1109/TIP.2015.2426416 CrossRef Google Scholar
[26]	Xu J T, Ye P, Li Q H, et al. Blind image quality assessment based on high order statistics aggregation[J]. IEEE Trans Image Process, 2016, 25 (9): 4444−4457. doi: 10.1109/TIP.2016.2585880 CrossRef Google Scholar
[27]	Bosse S, Maniry D, Müller K R, et al. Deep neural networks for no-reference and full-reference image quality assessment[J]. IEEE Trans Image Process, 2018, 27 (1): 206−219. doi: 10.1109/TIP.2017.2760518 CrossRef Google Scholar
[28]	Yan Q S, Gong D, Zhang Y N. Two-stream convolutional networks for blind image quality assessment[J]. IEEE Trans Image Process, 2019, 28 (5): 2200−2211. doi: 10.1109/TIP.2018.2883741 CrossRef Google Scholar
[29]	Golestaneh S A, Dadsetan S, Kitani K M. No-reference image quality assessment via transformers, relative ranking, and self-consistency[C]//Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, 2022: 3989–3999. https://doi.org/10.1109/WACV51458.2022.00404. Google Scholar
[30]	Zhang W X, Ma K D, Yan J, et al. Blind image quality assessment using a deep bilinear convolutional neural network[J]. IEEE Trans Circuits Syst Video Technol, 2020, 30 (1): 36−47. doi: 10.1109/TCSVT.2018.2886771 CrossRef Google Scholar
[31]	Su S L, Yan Q S, Zhu Y, et al. Blindly assess image quality in the wild guided by a self-adaptive hyper network[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 3664–3673. https://doi.org/10.1109/CVPR42600.2020.00372. Google Scholar
[32]	Wu J J, Ma J P, Liang F H, et al. End-to-end blind image quality prediction with cascaded deep neural network[J]. IEEE Trans Image Process, 2020, 29: 7414−7426. doi: 10.1109/TIP.2020.3002478 CrossRef Google Scholar
[33]	Pan Z Q, Zhang H, Lei J J, et al. DACNN: Blind image quality assessment via a distortion-aware convolutional neural network[J]. IEEE Trans Circuits Syst Video Technol, 2022, 32 (11): 7518−7531. doi: 10.1109/TCSVT.2022.3188991 CrossRef Google Scholar
[34]	Saha A, Mishra S, Bovik A C. Re-IQA: Unsupervised learning for image quality assessment in the wild[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 5846–5855. https://doi.org/10.1109/CVPR52729.2023.00566. Google Scholar
[35]	Qin G Y, Hu R Z, Liu Y T, et al. Data-efficient image quality assessment with attention-panel decoder[C]//Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023: 2091–2100. https://doi.org/10.1609/aaai.v37i2.25302. Google Scholar
[36]	Xu K M, Liao L, Xiao J, et al. Boosting image quality assessment through efficient transformer adaptation with local feature enhancement[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 2662–2672. https://doi.org/10.1109/CVPR52733.2024.00257. Google Scholar

Overview

Overview

With the extensive use of smartphones, users’ expectations for smartphone image quality have risen significantly. However, due to limitations in camera hardwares, smartphones often face constraints in light capture, especially in complex or low-light scenarios, which can lead to image quality degradation. Existing no-reference image quality assessment (IQA) algorithms frequently show limitations when handling smartphone-captured images, motivating the development of a more accurate quality evaluation method. This study proposes an approach based on manual features and a Swin-AK Transformer with dual cross-attention fusion, designed to assess smartphone image quality with greater precision. First, manual features affecting image quality are extracted, guided by the human visual system, enabling the capture of subtle visual variations such as color, contrast, and texture, which enhances the model’s sensitivity to image quality. To further improve the discriminative power for image quality assessment, ResNet50 is introduced after manual feature extraction to establish a nonlinear mapping between manual features and image quality. This process transforms initial low-level features into more representative high-level features, allowing for a more comprehensive expression of image content. Subsequently, the study introduces the Swin-AK Transformer, which utilizes a self-attention mechanism to capture local image features, thereby enhancing the model’s capability to recognize and process local information in smartphone images. This method effectively adapts to the unique characteristics of smartphone images, offering robust handling of intricate details. Additionally, a dual cross-attention fusion module is designed to integrate manual and deep features efficiently. The module combines spatial and channel attention mechanisms: spatial attention aids the model in focusing on key areas within the image, while channel attention optimizes feature representation by adjusting the weights of each channel. As a result, the fused features reflect both global image information and local detail variations, aligning well with the human visual system’s natural perception of image quality. Experiments were conducted on two public datasets, SPAQ and LIVE-C, to evaluate the proposed model. The results demonstrate the model’s superior performance in image quality prediction, achieving Pearson correlation coefficients of 0.932 and 0.885 and Spearman rank correlation coefficients of 0.929 and 0.858 on the SPAQ and LIVE-C datasets, respectively. These outcomes validate the proposed method’s effectiveness in smartphone image quality assessment tasks, showcasing improved sensitivity to quality changes and excellent accuracy and robustness.