➥ Tip! Refine or expand your search. Authors are sometimes listed as 'Smith, J. K.' instead of 'Smith, John' so it is useful to search for last names only. Note this is currently a simple phrase search.
Octgrav is a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of $ heta approx 0.5$.
To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.
The code has a convenient user interface and is freely available for use.
Sapporo mimics the behavior of GRAPE hardware and uses the GPU to perform high-precision gravitational N-body simulations. It makes use of CUDA and therefore only works on NVIDIA GPUs. N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. Sapporo's precision is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles.
Bonsai is a gravitational N-body tree-code that runs completely on the GPU. This reduces the amount of time spent on communication with the CPU. The code runs on NVIDIA GPUs and on a GTX480 it is able to integrate ~2.8M particles per second. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages.