Peng Bo, Luo Shasha, Yang Feng, et al. Performance analysis of a sum-table-based method for computing cross-correlation in GPU-accelerated ultrasound strain elastography[J]. Opto-Electronic Engineering, 2019, 46(6): 180437. doi: 10.12086/oee.2019.180437
Citation: Peng Bo, Luo Shasha, Yang Feng, et al. Performance analysis of a sum-table-based method for computing cross-correlation in GPU-accelerated ultrasound strain elastography[J]. Opto-Electronic Engineering, 2019, 46(6): 180437. doi: 10.12086/oee.2019.180437

Performance analysis of a sum-table-based method for computing cross-correlation in GPU-accelerated ultrasound strain elastography

    Fund Project: Supported by Scientific Innovation Program of Sichuan Province (Major Engineering Project: 2018RZ0093) and Nanchong Scientific Council (Strategic Cooperation Program Between University and City: NC17SY4020)
More Information
  • The calculation of correlation is critically important for ultrasound strain elastography. The sum-table based method for the calculation of the normalized correlation coefficient (ST-NCC) can greatly improve computational efficiency under an environment of serial computing. Its implementation and performance are yet to be investigated when given a parallel computing platform, particularly, under a GPU environment. In this study, the published ST-NCC method was implemented into GPU and its performance was evaluated for speckle tracking. Particularly, the performance of the ST-NCC method was compared to the classic method of computing NCC using simulated ultrasound data. Our preliminary results indicated that, under the GPU platform, the implemented ST-NCC method did not further improve the computational efficiency, as compared to the classic NCC method implemented into the same GPU platform.
  • 加载中
  • [1] Jiang J, Hall T J. A parallelizable real-time motion tracking algorithm with applications to ultrasonic strain imaging[J]. Physics in Medicine & Biology, 2007, 52(13): 3773-3790. doi: 10.1088/0031-9155/52/13/008

    CrossRef Google Scholar

    [2] Chen L J, Treece G M, Lindop J E, et al. A quality-guided displacement tracking algorithm for ultrasonic elasticity imaging[J]. Medical Image Analysis, 2009, 13(2): 286-296. doi: 10.1016/j.media.2008.10.007

    CrossRef Google Scholar

    [3] Peng B, Wang Y Q, Hall T J, et al. A GPU-accelerated 3-D coupled subsample estimation algorithm for volumetric breast strain elastography[J]. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 2017, 64(4): 694-705. doi: 10.1109/TUFFC.2017.2661821

    CrossRef Google Scholar

    [4] Zhou Y J, Zheng Y P. A motion estimation refinement framework for real-time tissue axial strain estimation with freehand ultrasound[J]. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 2010, 57(9): 1943-1951. doi: 10.1109/TUFFC.2010.1642

    CrossRef Google Scholar

    [5] Luo J W, Konofagou E E. A fast normalized cross-correlation calculation method for motion estimation[J]. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 2010, 57(6): 1347-1357. doi: 10.1109/TUFFC.2010.1554

    CrossRef Google Scholar

    [6] Zhu Y N, Hall T J. A modified block matching method for real-time freehand strain imaging[J]. Ultrasonic Imaging, 2002, 24(3): 161-176. doi: 10.1177/016173460202400303

    CrossRef Google Scholar

    [7] D'Hooge J, Bijnens B, Thoen J, et al. Echocardiographic strain and strain-rate imaging: a new tool to study regional myocardial function[J]. IEEE Transactions on Medical Imaging, 2002, 21(9): 1022-1030. doi: 10.1109/TMI.2002.804440

    CrossRef Google Scholar

    [8] Konofagou E E, D'Hooge J, Ophir J. Myocardial elastography--a feasibility study in vivo[J]. Ultrasound in Medicine & Biology, 2002, 28(4): 475-482. doi: 10.1016/S0301-5629(02)00488-X

    CrossRef Google Scholar

    [9] Lewis J P. Fast template matching[J]. Proceeding of Vision Interface, 1995, 32(4): 351-361.

    Google Scholar

    [10] Yang X, Deka S, Righetti R. A hybrid CPU-GPGPU approach for real-time elastography[J]. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 2011, 58(12): 2631-2645. doi: 10.1109/TUFFC.2011.2126

    CrossRef Google Scholar

    [11] 彭博, 黄丽. GPU加速的高精度位移估计方法及超声弹性成像应用[J].光电工程, 2016, 43(6): 83-88. doi: 10.3969/j.issn.1003-501X.2016.06.014

    CrossRef Google Scholar

    Peng B, Huang L. GPU-accelerated sub-sample displacement estimation method for real-time ultrasound elastography[J]. Opto-Electronic Engineering, 2016, 43(6): 83-88. doi: 10.3969/j.issn.1003-501X.2016.06.014

    CrossRef Google Scholar

    [12] 彭博, 谌勇, 刘东权.基于GPU的超声弹性成像并行实现研究[J].光电工程, 2013, 40(5): 97-105. doi: 10.3969/j.issn.1003-501X.2013.05.014

    CrossRef Google Scholar

    Peng B, Chen Y, Liu D Q. Investigation of GPU-based ultrasound elastography[J]. Opto-Electronic Engineering, 2013, 40(5): 97-105. doi: 10.3969/j.issn.1003-501X.2013.05.014

    CrossRef Google Scholar

    [13] Rosenzweig S, Palmeri M, Nightingale K. GPU-based real-time small displacement estimation with ultrasound[J]. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 2011, 58(2): 399-405. doi: 10.1109/TUFFC.2011.1817

    CrossRef Google Scholar

    [14] Chang L W, Hsu K H, Li P C. GPU-based color Doppler ultrasound processing[C]//2009 IEEE International Ultrasonics Symposium. Rome, Italy, 2009.

    Google Scholar

    [15] Sun X, Wang S S, Song J J, et al. Toward parallel optimal computation of ultrasound computed tomography using GPU[J]. Proceedings of SPIE, 2018, 10580: 105800R.

    Google Scholar

    [16] Sengupta S, Harris M, Garland M, et al. Efficient parallel scan algorithms for GPUs[M]//Kurzak J, Bader D A, Dongarra J. Scientific Computing with Multicore and Accelerators. Boca Raton: Taylor & Francis, 2008.

    Google Scholar

    [17] Blelloch G E. Scans as primitive parallel operations[J]. IEEE Transactions on Computers, 2002, 38(11): 1526-1538. doi: 10.1109/12.42122

    CrossRef Google Scholar

    [18] Jensen J A. Field: A program for simulating ultrasound systems[J]. Medical & Biological Engineering & Computing, 1996, 34(1): 351-352.

    Google Scholar

    [19] Luo J W, Bai J, He P, et al. Axial strain calculation using a low-pass digital differentiator in ultrasound elastography[J]. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 2004, 51(9): 1119-1127. doi: 10.1109/TUFFC.2004.1334844

    CrossRef Google Scholar

    [20] Du H N, Liu J, Pellot-Barakat C, et al. Optimizing multicompression approaches to elasticity imaging[J]. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 2006, 53(1): 90-99. doi: 10.1109/TUFFC.2006.1588394

    CrossRef Google Scholar

  • Overview: In our ultrasound strain elastography system, a modified block-matching algorithm is adopted to assess tissue motion. Then, local strains are assessed and used as surrogates of tissue elasticity. The calculation of correlation under the framework of the block-matching algorithm is a critical step and very computationally intensive. Because the correlation calculation is largely independent, graphics processing units (GPUs) have been utilized to improve computational efficiency through massive parallel programming. It is known in the literature that the sum-table based method can greatly reduce the computing burden when the calculation of the normalized correlation coefficient is needed in a serial computing environment. The sum-table based method is abbreviated as ST-NCC below. However, the performance of ST-NCC is yet to be investigated given a parallel computing platform, particularly, in a GPU environment. Consequently, our objective of this study is to investigate the performance of the ST-NCC method for the above-mentioned GPU-accelerated ultrasound strain elastography. More specifically, a published ST-NCC method by Luo et al. and the conventional NCC method were both programmed using CUDA (Version 9.0, NVIDIA Inc., CA, USA) and tested on an NVIDIA GeForce GTX TITAN X card. During the CUDA implementation, in order to achieve the best computational efficiency, two basic CUDA programming strategies were employed to improve computational efficiency for all CUDA implementation. First, in order to increase the memory bandwidth of GPUs, TEXTURE (memory) access was used for storing 2-D RF signals prior to the calculation of cross correlation. Second, programming variables that require frequent access (e.g., axial and lateral search ranges) were locked in read-only memory for rapid access. In terms of advanced CUDA programming strategies, on the one hand, a classic parallel scan method was adopted to generate those sum-table data for the ST-NCC method. On the other hand, a few different on-ship memory optimization strategies were used to implement the classic NCC method and they were compared against each other. Only the computationally most efficient implementation was used to compare with the above-mentioned GPU-accelerated ST-NCC method. Finally, performance assessments were conducted using simulated ultrasound data. Ultrasound data simulations involve both finite element modeling and acoustic simulations. Both displacement tracking accuracy and computational efficiency were evaluated during the performance assessments. Based on data investigated, we found that, under the GPU platform, the implemented ST-NCC method did not further improve the computational efficiency, as compared to the classic NCC method implemented into the same GPU platform. Comparable displacement tracking accuracy was obtained by both methods.

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(6)

Article Metrics

Article views(6178) PDF downloads(2023) Cited by(0)

Access History

Other Articles By Authors

Article Contents

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint