Accelerating Radiative Transfer Simulation with GPU-FPGA Cooperative Computation
Ryohei Kobayashi, Norihisa Fujita, Yoshiki Yamaguchi, Taisuke Boku, Kohji Yoshikawa, Makito Abe and Masayuki Umemura
University of Tsukuba, Japan
Field-programmable gate arrays (FPGAs) have garnered significant interest in research on high-performance computing. This is ascribed to the drastic improvement in their computational and communication capabilities in recent years
owing to advances in semiconductor integration technologies that rely on Moore’s Law. In addition to these performance
improvements, toolchains for the development of FPGAs in OpenCL have been offered by FPGA vendors to reduce the
programming effort required. These improvements suggest the possibility of implementing the concept of enabling on-the-fly
offloading computation at which CPUs/GPUs perform poorly relative to FPGAs while performing low-latency data transfers.
We consider this concept to be of key importance to improve the
performance of heterogeneous supercomputers that employ accelerators such as a GPU. In this study, we propose GPU–FPGAaccelerated simulation based on this concept and demonstrate the implementation of the proposed method with CUDA and
OpenCL mixed programming. The experimental results showed that our proposed method can increase the performance by
up to 17.4× compared with GPU-based implementation. This performance is still 1.32× higher even when solving problems
with the largest size, which is the fastest problem size for GPUbased implementation. We consider the realization of GPU–FPGA-accelerated simulation to be the most significant difference
between our work and previous studies.
[The authors opted for not publicly sharing a presentation video.]