Compute unified device architecture (CUDA) is a software development platform that enables us to write and run general-purpose
applications on the graphics processing unit (GPU). This paper presents a fast method for cone beam reconstruction using the
CUDA-enabled GPU. The proposed method is accelerated by two techniques: (1) off-chip memory access reduction; and (2) memory
latency hiding. We describe how these techniques can be incorporated into CUDA code. Experimental results show that the proposed
method runs at 82% of the peak memory bandwidth, taking 5.6 seconds to reconstruct a 5123-voxel volume from 360 5122-pixel projections. This performance is 18% faster than the prior method. Some detailed analyses are also presented to understand
how effectively the acceleration techniques increase the reconstruction performance of a naive method.
This work was partly supported by JSPS Grant-in-Aid for Scientific Research (A)(2) (20240002), Young Researchers (B)(19700061),
and the Global COE Program “in silico medicine” at Osaka University.