Apache Spark, the in-memory big data processing framework, will become fully GPU accelerated in its soon-to-be-released 3.0 incarnation. Best of all, today’s Spark applications can take advantage of the GPU acceleration without modification; existing Spark APIs all work as-is.
The GPU acceleration components, provided by Nvidia, are designed to complement all phases of Spark applications including ETL operations, machine learning training, and inference serving.
Nvidia’s Spark contributions draw on the RAPIDS suite of GPU-accelerated data science libraries. Many of RAPIDS’ internal data structures, like dataframes, complement Spark’s own, but getting Spark to use RAPIDS natively has taken nearly four