The graphics processors (GPUs) have recently emerged as a low-cost alternative for parallel programming. Since modern GPUs
have great computational power as well as high memory bandwidth, running ray tracing on them has been an active field of research
in computer graphics in recent years. Furthermore, the introduction of CUDA, a novel GPGPU architecture, has removed several
limitations that the traditional GPU-based ray tracing suffered. In this paper, an implementation of high per formance CUDA
ray tracing is demonstrated. We focus on the perfor mance and show how our design choices in various optimization lead to
an implementation that outperforms the previous works. For reasonably complex scenes with simple shading, our implementation
achieves the performance of 30 to 43 million traced rays per second. Our implementation also includes the effects of recursive
specular reflection and refraction, which were less discussed in previous GPU-based ray tracing works.
Keywords Ray Tracing - Programmable Graphics Hardware - GPU Computing - CUDA - Multithreaded Architectures