Building machine learning models is a repetitive process. Often rote and routine, this is a game of “fastest through the cycle wins,” as the faster you can iterate, the easier it is to explore new theories and get good answers. This is one of the reasons practical enterprise use of AI today is dominated by the largest enterprises, which can throw enormous resources at the problem.
Rapids is an umbrella for several open source projects, incubated by Nvidia, that puts the entire processing pipeline on the GPU, eliminating the I/O bound data transfers, while also substantially increasing the speed of each of the individual steps. It also provides a common format for the data, easing the burden of exchanging data between disparate systems. At the user level, Rapids mimics the Python API in order to ease the transition for that user base.
Rapids ecosystem architecture
The Rapids project aims to replicate, for the most part, the machine learning and data analytics APIs of Python, but for GPUs rather than CPUs. This means that Python developers already have everything they need to run on the GPU, without having to learn the low-level details of CUDA programming and parallel operations. Pythonistas can develop code on a non-GPU enabled machine, then, with a few tweaks, run it on all the GPUs available to them.