Hou G P, Dong W, Lu L K, et al. Smartphone image quality assessment method based on Swin-AK Transformer[J]. Opto-Electron Eng, 2025, 52(1): 240264. doi: 10.12086/oee.2025.240264
Citation: Hou G P, Dong W, Lu L K, et al. Smartphone image quality assessment method based on Swin-AK Transformer[J]. Opto-Electron Eng, 2025, 52(1): 240264. doi: 10.12086/oee.2025.240264

Smartphone image quality assessment method based on Swin-AK Transformer

    Fund Project: The Important Project of Digital Education Research of Beijing (BDEC2022619027), 2023 Project Proposal of Beijing Higher Education Association (MS2023168), the Research Project of Beijing Institute of Graphic Communication (Ec202303, Ea202301, E6202405), the Disciplinary Construction and Postgraduate Education Project of Beijing Institute of Graphic Communication (21090323009, 21090224002, 21090124013), Classification Development of Beijing Municipal Universities-Construction Project of Emerging Interdisciplinary Platform for Publishing at Beijing Institute of Graphic Communication-Key Technology Research and Development Platform for Digital Inkjet Printing Technology and Multifunctional Rotary Offset Press (04190123001/003), Open Foundation of State key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) (SKLNST-2023-1-12) , and the Project of the "Artificial Intelligence Plus" Course Construction of Beijing Institute of Graphic Communication
More Information
  • This paper proposes a smartphone image quality assessment method that combines the Swin-AK Transformer based on alterable kernel convolution and manual features based on dual attention cross-fusion. Firstly, manual features that affected image quality were extracted. These features could capture subtle visual changes in images. Secondly, the Swin-AK Transformer was presented and it could improve the extraction and processing of local information. In addition, a dual attention cross-fusion module was designed, integrating spatial attention and channel attention mechanisms to fuse manual features with deep features. Experimental results show that the Pearson correlation coefficients on the SPAQ and LIVE-C datasets reached 0.932 and 0.885, respectively, while the Spearman rank-order correlation coefficients reached 0.929 and 0.858, respectively. These results demonstrate that the proposed method in this paper can effectively predict the quality of smartphone images.
  • 加载中
  • [1] 鄢杰斌, 方玉明, 刘学林. 图像质量评价研究综述——从失真的角度[J]. 中国图象图形学报, 2022, 27 (5): 1430−1466. doi: 10.11834/jig.210790

    CrossRef Google Scholar

    Yan J B, Fang Y M, Liu X L. The review of distortion-related image quality assessment[J]. J Image Graphics, 2022, 27 (5): 1430−1466. doi: 10.11834/jig.210790

    CrossRef Google Scholar

    [2] Ke J J, Wang Q F, Wang Y L, et al. MUSIQ: multi-scale image quality transformer[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 5128–5137. https://doi.org/10.1109/ICCV48922.2021.00510.

    Google Scholar

    [3] Varga D. No-reference image quality assessment using the statistics of global and local image features[J]. Electronics, 2023, 12 (7): 1615. doi: 10.3390/electronics12071615

    CrossRef Google Scholar

    [4] Jain P, Shikkenawis G, Mitra S K. Natural scene statistics and CNN based parallel network for image quality assessment[C]//2021 IEEE International Conference on Image Processing (ICIP), 2021: 1394–1398. https://doi.org/10.1109/ICIP42928.2021.9506404.

    Google Scholar

    [5] Shao X, Liu M Q, Li Z H, et al. CPDINet: blind image quality assessment via a content perception and distortion inference network[J]. IET Image Processing, 2022, 16 (7): 1973−1987. doi: 10.1049/ipr2.12463

    CrossRef Google Scholar

    [6] Zhao K, Yuan K, Sun M, et al. Quality-aware pretrained models for blind image quality assessment[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023: 22302–22313. https://doi.org/10.1109/CVPR52729.2023.02136.

    Google Scholar

    [7] Fang Y M, Zhu H W, Zeng Y, et al. Perceptual quality assessment of smartphone photography[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 3674–3683. https://doi.org/10.1109/CVPR42600.2020.00373.

    Google Scholar

    [8] Yuan Z F, Qi Y, Hu M H, et al. Opinion-unaware no-reference image quality assessment of smartphone camera images based on aesthetics and human perception[C]//2020 IEEE International Conference on Multimedia & Expo Workshops, 2020: 1–6. https://doi.org/10.1109/ICMEW46912.2020.9106048.

    Google Scholar

    [9] Zhou Y W, Wang Y L, Kong Y Y, et al. Multi-Indicator image quality assessment of smartphone camera based on human subjective behavior and perception[C]//2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2020: 1–6. https://doi.org/10.1109/ICMEW46912.2020.9105971.

    Google Scholar

    [10] Huang C H, Wu J L. Multi-task deep CNN model for no-reference image quality assessment on smartphone camera photos[Z]. arXiv: 2008.11961, 2020. https://arxiv.org/abs/2008.11961.

    Google Scholar

    [11] Yao C, Lu Y R, Liu H, et al. Convolutional neural networks based on residual block for no-reference image quality assessment of smartphone camera images[C]//Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2020: 1–6. https://doi.org/10.1109/ICMEW46912.2020.9106034.

    Google Scholar

    [12] 王斌, 白永强, 朱仲杰, 等. 联合空角信息的无参考光场图像质量评价[J]. 光电工程, 2024, 51 (9): 240139. doi: 10.12086/oee.2024.240139

    CrossRef Google Scholar

    Wang B, Bai Y Q, Zhu Z J, et al. No-reference light field image quality assessment based on joint spatial-angular information[J]. Opto-Electron Eng, 2024, 51 (9): 240139. doi: 10.12086/oee.2024.240139

    CrossRef Google Scholar

    [13] 陈松, 温宇鑫, 安浩铭. 基于多尺度双边滤波Retinex的非均匀光照散斑图像矫正[J/OL]. 激光技术, 1-16 [2025-01-18].

    Google Scholar

    http://kns.cnki.net/kcms/detail/51.1125.TN.20240116.1129.004.html.

    Google Scholar

    [14] 刘佳, 唐鋆磊, 林冰, 等. 基于HSV (色相-饱和度-明度)与形状特征的涂层锈点图像识别[J]. 中国表面工程, 2023, 36 (4): 217−228. doi: 10.11933/j.issn.1007-9289.20221008001

    CrossRef Google Scholar

    Liu J, Tang J L, Lin B, et al. Rust spot image recognition of coatings based on HSV and shape feature[J]. China Surf Eng, 2023, 36 (4): 217−228. doi: 10.11933/j.issn.1007-9289.20221008001

    CrossRef Google Scholar

    [15] Liu Q G, Liu P, Wang Y H, et al. Semi-parametric decolorization with Laplacian-based perceptual quality metric[J]. IEEE Trans Circuits Syst Video Technol, 2017, 27 (9): 1856−1868. doi: 10.1109/tcsvt.2016.2555779

    CrossRef Google Scholar

    [16] 马常昊, 胡文惠, 钟海超, 等. 融合Sobel算子的SAR图像结构优化方法[J]. 探测与控制学报, 2024, 46 (2): 119−124.

    Google Scholar

    Ma C H, Hu W H, Zhong H C, et al. SAR image structure optimization method using Sobel operator fusion[J]. J Detect Control, 2024, 46 (2): 119−124.

    Google Scholar

    [17] 罗小燕, 胡振, 汤文聪, 等. 傅里叶和LBP描述子相结合的矿石颗粒种类识别[J]. 传感器与微系统, 2023, 42 (11): 147−150. doi: 10.13873/J.1000-9787(2023)11-0147-04

    CrossRef Google Scholar

    Luo X Y, Hu Z, Tang W C, et al. Species identification of ore particles combined with Fourier and LBP descriptors[J]. Transducer Microsyst Technol, 2023, 42 (11): 147−150. doi: 10.13873/J.1000-9787(2023)11-0147-04

    CrossRef Google Scholar

    [18] Liu Z, Lin Y T, Cao Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986.

    Google Scholar

    [19] 王高平, 李珣, 贾雪芳, 等. 融合Swin Transformer的立体匹配方法STransMNet[J]. 光电工程, 2023, 50 (4): 220246. doi: 10.12086/oee.2023.220246

    CrossRef Google Scholar

    Wang G P, Li X, Jia X F, et al. STransMNet: a stereo matching method with Swin Transformer fusion[J]. Opto-Electron Eng, 2023, 50 (4): 220246. doi: 10.12086/oee.2023.220246

    CrossRef Google Scholar

    [20] Ghadiyaram D, Bovik A C. Massive online crowdsourced study of subjective and objective picture quality[J]. IEEE Trans Image Process, 2016, 25 (1): 372−387. doi: 10.1109/TIP.2015.2500021

    CrossRef Google Scholar

    [21] Saad M A, Bovik A C, Charrier C. Blind image quality assessment: a natural scene statistics approach in the DCT domain[J]. IEEE Trans Image Process, 2012, 21 (8): 3339−3352. doi: 10.1109/TIP.2012.2191563

    CrossRef Google Scholar

    [22] Moorthy A K, Bovik A C. Blind image quality assessment: from natural scene statistics to perceptual quality[J]. IEEE Trans Image Process, 2011, 20 (12): 3350−3364. doi: 10.1109/TIP.2011.2147325

    CrossRef Google Scholar

    [23] Mittal A, Moorthy A K, Bovik A C. No-reference image quality assessment in the spatial domain[J]. IEEE Trans Image Process, 2012, 21 (12): 4695−4708. doi: 10.1109/TIP.2012.2214050

    CrossRef Google Scholar

    [24] Ye P, Kumar J, Kang L, et al. Unsupervised feature learning framework for no-reference image quality assessment[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 1098–1105. https://doi.org/10.1109/CVPR.2012.6247789.

    Google Scholar

    [25] Zhang L, Zhang L, Bovik A C. A feature-enriched completely blind image quality evaluator[J]. IEEE Trans Image Process, 2015, 24 (8): 2579−2591. doi: 10.1109/TIP.2015.2426416

    CrossRef Google Scholar

    [26] Xu J T, Ye P, Li Q H, et al. Blind image quality assessment based on high order statistics aggregation[J]. IEEE Trans Image Process, 2016, 25 (9): 4444−4457. doi: 10.1109/TIP.2016.2585880

    CrossRef Google Scholar

    [27] Bosse S, Maniry D, Müller K R, et al. Deep neural networks for no-reference and full-reference image quality assessment[J]. IEEE Trans Image Process, 2018, 27 (1): 206−219. doi: 10.1109/TIP.2017.2760518

    CrossRef Google Scholar

    [28] Yan Q S, Gong D, Zhang Y N. Two-stream convolutional networks for blind image quality assessment[J]. IEEE Trans Image Process, 2019, 28 (5): 2200−2211. doi: 10.1109/TIP.2018.2883741

    CrossRef Google Scholar

    [29] Golestaneh S A, Dadsetan S, Kitani K M. No-reference image quality assessment via transformers, relative ranking, and self-consistency[C]//Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, 2022: 3989–3999. https://doi.org/10.1109/WACV51458.2022.00404.

    Google Scholar

    [30] Zhang W X, Ma K D, Yan J, et al. Blind image quality assessment using a deep bilinear convolutional neural network[J]. IEEE Trans Circuits Syst Video Technol, 2020, 30 (1): 36−47. doi: 10.1109/TCSVT.2018.2886771

    CrossRef Google Scholar

    [31] Su S L, Yan Q S, Zhu Y, et al. Blindly assess image quality in the wild guided by a self-adaptive hyper network[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 3664–3673. https://doi.org/10.1109/CVPR42600.2020.00372.

    Google Scholar

    [32] Wu J J, Ma J P, Liang F H, et al. End-to-end blind image quality prediction with cascaded deep neural network[J]. IEEE Trans Image Process, 2020, 29: 7414−7426. doi: 10.1109/TIP.2020.3002478

    CrossRef Google Scholar

    [33] Pan Z Q, Zhang H, Lei J J, et al. DACNN: Blind image quality assessment via a distortion-aware convolutional neural network[J]. IEEE Trans Circuits Syst Video Technol, 2022, 32 (11): 7518−7531. doi: 10.1109/TCSVT.2022.3188991

    CrossRef Google Scholar

    [34] Saha A, Mishra S, Bovik A C. Re-IQA: Unsupervised learning for image quality assessment in the wild[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 5846–5855. https://doi.org/10.1109/CVPR52729.2023.00566.

    Google Scholar

    [35] Qin G Y, Hu R Z, Liu Y T, et al. Data-efficient image quality assessment with attention-panel decoder[C]//Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023: 2091–2100. https://doi.org/10.1609/aaai.v37i2.25302.

    Google Scholar

    [36] Xu K M, Liao L, Xiao J, et al. Boosting image quality assessment through efficient transformer adaptation with local feature enhancement[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 2662–2672. https://doi.org/10.1109/CVPR52733.2024.00257.

    Google Scholar

  • With the extensive use of smartphones, users’ expectations for smartphone image quality have risen significantly. However, due to limitations in camera hardwares, smartphones often face constraints in light capture, especially in complex or low-light scenarios, which can lead to image quality degradation. Existing no-reference image quality assessment (IQA) algorithms frequently show limitations when handling smartphone-captured images, motivating the development of a more accurate quality evaluation method. This study proposes an approach based on manual features and a Swin-AK Transformer with dual cross-attention fusion, designed to assess smartphone image quality with greater precision. First, manual features affecting image quality are extracted, guided by the human visual system, enabling the capture of subtle visual variations such as color, contrast, and texture, which enhances the model’s sensitivity to image quality. To further improve the discriminative power for image quality assessment, ResNet50 is introduced after manual feature extraction to establish a nonlinear mapping between manual features and image quality. This process transforms initial low-level features into more representative high-level features, allowing for a more comprehensive expression of image content. Subsequently, the study introduces the Swin-AK Transformer, which utilizes a self-attention mechanism to capture local image features, thereby enhancing the model’s capability to recognize and process local information in smartphone images. This method effectively adapts to the unique characteristics of smartphone images, offering robust handling of intricate details. Additionally, a dual cross-attention fusion module is designed to integrate manual and deep features efficiently. The module combines spatial and channel attention mechanisms: spatial attention aids the model in focusing on key areas within the image, while channel attention optimizes feature representation by adjusting the weights of each channel. As a result, the fused features reflect both global image information and local detail variations, aligning well with the human visual system’s natural perception of image quality. Experiments were conducted on two public datasets, SPAQ and LIVE-C, to evaluate the proposed model. The results demonstrate the model’s superior performance in image quality prediction, achieving Pearson correlation coefficients of 0.932 and 0.885 and Spearman rank correlation coefficients of 0.929 and 0.858 on the SPAQ and LIVE-C datasets, respectively. These outcomes validate the proposed method’s effectiveness in smartphone image quality assessment tasks, showcasing improved sensitivity to quality changes and excellent accuracy and robustness.

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(17)

Tables(3)

Article Metrics

Article views() PDF downloads() Cited by()

Access History
Article Contents

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint