SlideShare a Scribd company logo
1 of 19
Download to read offline
April 27 2021
Advancing GPU Analytics
with RAPIDS Accelerator
for Spark and Alluxio
2
AGENDA
1. Introduction for RAPIDS Accelerator for Spark
2. RAPIDS Accelerator for Spark Performance
3. GPU Acceleration combined with Alluxio
3
GROWTH IN REQUIREMENT FOR DATA PROCESSING
2030
2020
2010
2000
Hadoop Era Spark Era Spark GPU Era
Spark 2.0 on
CPUs
GPU Accelerated
Spark 3.0
“These contributions lead to faster data pipelines,
model training and scoring for more breakthroughs and
insights with Apache Spark 3.0 and Databricks.”
Matei Zaharia, creator of Apache Spark and chief
technologist at Databricks
4
Accelerate data preparation
Quickly move to next stages of
the pipeline
Focus on most-critical activities
Orchestrate end-to-end pipelines
From ETL to model training to
visualization
Same infrastructure for Spark
and ML/DL frameworks
Complete jobs faster with less
hardware
Save on-prem and in the cloud
Do more with less
SPARK 3.0 ON NVIDIA GPUs
Accelerate data science pipelines without code changes
Faster Execution Time Streamline Analytics to AI
Reduced
Infrastructure Costs
NVIDIA BRINGS GPU ACCELERATION TO APACHE SPARK
Features
• Use existing (unmodified) customer
code
• Spark features that are not GPU
enabled run transparently on the
CPU
Initial Release - GPU Acceleration of:
• Spark Data Frames
• Spark SQL
• ML/DL training frameworks
Seamless integration with Spark 3.0
RAPIDS ACCELERATOR FOR APACHE SPARK 
UCX Libraries
RAPIDS libcudf
(C++ Libraries)
CUDA
JNI bindings
Mapping From Java/Scala to C++
RAPIDS Accelerator
for Spark
DISTRIBUTED SCALE-OUT SPARK APPLICATIONS
Spark SQL API Spark Shuffle
DataFrame API
if gpu_enabled(operation, data_type)
call-out to RAPIDS
else
execute standard Spark operation
JNI bindings
Mapping From Java/Scala to C++
● Custom Implementation of Spark
Shuffle
● Optimized to use RDMA and
GPU-to-GPU direct communication
APACHE SPARK CORE
GAME-CHANGING PERFORMANCE GAINS
7x Performance Boost
90% Cost Savings
on Databricks
Opens new possibilities for AI-driven services in Adobe Experience Cloud
“We’re seeing significantly faster performance with
NVIDIA-accelerated Spark 3.0 compared to running
Spark on CPUs. With these game-changing GPU
performance gains, new possibilities open up for
enhancing AI-driven features in our full suite of
Adobe Experience Cloud apps.”
— William Yan, Senior Director of Machine Learning at Adobe
RAPIDS ACCELERATOR ECOSYSTEM MOMENTUM
 
Databricks
Machine Learning
Runtime
Google Cloud
Dataproc
Apache Spark 3.0
Community Release
Amazon EMR
Available
Now
Available
Now
Available
Now
Available
Now
Cloudera CDP
Available in
Jun’21
9
NVIDIA CONFIDENTIAL - DO NOT COPY OR DISTRIBUTE
Nodes 8
CPU
2 x AMD EPYC 7452 
(64 cores/128 threads)
GPU 2 x NVIDIA Ampere A100, PCIe, 250W, 40GB
RAM 0.5 TB
Storage 4 x 7.68 TB Gen4 U.2 NVMe
Networking 1 x Mellanox CX-6 Single Port HDR100 QSFP56
Cost w/o GPU ~$42,000 per w/ bulk discount
Cost w/ GPU ~$71,000 per w/ bulk discount
Software
HDFS (Hadoop 3.2.1) 
Spark 3.0.2 (stand alone)
EGX / NVIDIA Certified
OEM servers
Benchmark Environment – EGX
SPARK SQL QUERIES – EGX CLUSTER
Based on 97 NVIDIA Decision Support (NDS) benchmark (3TB Dataset without decimals)*
GPU is 3.21x faster with a cost ratio of 0.52
(GPU cost was 52% that of the CPU)
Queries 14b and 72 were removed because of failures
*NVIDIA Decision Support (NDS) benchmark is derived from the TPC-DS benchmark and is used for internal performance testing. Results from NDS are not comparable to TPC-DS
UCX ON VS OFF
GPU + UCX shuffle is 1.23x faster than the GPU alone.
Queries 67 and 72 removed because of failures.
GPU + UCX Shuffle is 4.15x faster than the CPU and a cost ratio of 0.41
Queries 14a, 67, and 72 were removed because of failures.
SPARK SQL QUERIES – EGX CLUSTER
12
12
Nodes 1 driver (CPU only), 4 workers
CPU
n1-standard-4 (driver)
4 x n1-standard-32 (workers)
GPU 4 x 16GB T4 per executor
RAM 120 GB
Storage Google Cloud Storage/Alluxio with SSD
Networking 32 Gbps
Cost w/o GPU $7.82/hour incl GCE + Dataproc
Cost w/ GPU $13.41/hour incl GCE + Dataproc
Software Dataproc Spark 3.0.1 + YARN
Benchmark Environment – GCP DATAPROC
13
*NVIDIA Decision Support (NDS) benchmark is derived from the TPC-DS benchmark and is used for internal performance testing. Results from NDS are not comparable to TPC-DS
WHY SOME GPU QUERIES FAILED
GPU Memory Limitations/Spilling
Operation Problem Solution
Sort In cases of data skew, the amount of data
being sorted can exceed limits of the
hardware/software.
A modified external batch sort is
implemented in the working
branch for the 0.5 release.
Window* In cases of data skew the amount of data
in the window operation can exceed the
limits of the hardware/software.
Implement a chunked rank
optimization. Github issue #1859
Join Worst case join output row count is
left.rows * right.rows
Materialize the output of a join in
chunks. Github issue #1629
Have conditional filters run as a
part of the join. Github issue #288
* Actually, this is for rank which we don’t support yet but plan to in the 0.5 release
15
WHY IS THE GPU SLOWER FOR SOME QUERIES?
• Failed Queries
• Small Data Sizes
(spark.sql.adaptive.advisoryPartitionSizeInBytes=1G)
• Q28, Q44, and Q67
• Less computation overlap
(spark.rapids.sql.concurrentGpuTasks=1)
• Host/Device Memory Transfers
• All of them
• Cache Consistency on Reductions/Very Small Aggregate
Results
• Q88
• Lack of GPU support and CPU parallelism is much less
• Q44, Q49, and Q67
16
ALLUXIO CONFIGURATION
- Co-locate the Alluxio worker nodes with Spark worker nodes to ensure short-circuit
reads and writes.
- Size cache according to the working set.
- Choose the right cache medium choice(SSD or System Memory)
17
spark.rapids.sql.enabled is the master enable
spark.rapids.sql.explain enables logging of operations not accelerated
- Set to NOT_ON_GPU to print only incompatible ops
spark.rapids.sql.concurrentGpuTasks controls concurrent task count per GPU
- Set to a value between 2 and 4, with 2 typically providing the most benefit
spark.rapids.memory.pinnedPool.size significantly improves performance of
data transfers between the GPU and host memory
RAPIDS ACCELERATOR CONFIGURATION
18
WILL MY SPARK WORKLOAD ACCELERATE WITHOUT CHANGES?
If I know my Spark workload characteristics...
Accelerates Well on GPUs Not for GPUs
Data Pipeline
Use Cases
● Data Mining, Analytics and BI
● Batch processing and writing large datasets to a Data
Warehouse
● Data extraction, aggregation and feature preparation for ML
Training & Inference
● Real-time Streaming Analytics/AI pipeline
● Online Transaction Processing (OLTP)
● Data Pipeline with custom code
Technical
Characteristics
● Batch processing of GB+ data sets
● Parquet, ORC, CSV data formats
● HDFS, S3-compatible, or V2 data sources
● DataFrame/SQL (join, agg, sort, window), Selected Hive &
Scala UDFs
● Stream processing
● Spark RDD, MLLib, Dataset, GraphX, Streaming libraries
If I am unsure...
Use the Log-Analysis Tool
● Review Spark history logs from existing CPU jobs
● Understand how much of the workloads could execute on GPUs
● Get tips on optional code optimizations for GPUs
Apache Spark
Apache Spark - Core
Catalyst Query Optimizer
Spark Streaming
Spark SQL
Spark Dataframes Spark Datasets RDD
Spark Shuffle
Spark MLLib GraphX
CPU Only
GPU Aware
GPU Accelerated
Partially GPU
Accelerated
19
Summary
RAPIDS Accelerator for Spark unlocks GPU
acceleration for Spark dataframes, Spark SQL, &
ML/DL frameworks such as XGBoost(with more
coming)
Alluxio is a high performance data orchestration
system for GPU compute.
Spark & GPUs on Alluxio optimizes for
performance and cost on cloud scale datasets.
Spark 3 & GPUs on Databricks, EMR and Dataproc
available today
Try it yourself
https://nvidia.github.io/spark-rapids/Getting-
Started/
Developer Blog:
Accelerating Analytics and AI with Alluxio
and NVIDIA GPUs
GTC Talk:
Enabling Data Orchestration with RAPIDS
Accelerator [S32746]
Accelerating Apache Spark Shuffle with UCX
[S31822]
Tuning GPU Network and Memory Usage in
Apache Spark [S31566]
Running Large-Scale ETL Benchmarks with
GPU-Accelerated Apache Spark [S31846]
……. and more

More Related Content

What's hot

What's hot (20)

Osquery
OsqueryOsquery
Osquery
 
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Building a DevOps organization
Building a DevOps organizationBuilding a DevOps organization
Building a DevOps organization
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
DevOps: Infrastructure as Code
DevOps: Infrastructure as CodeDevOps: Infrastructure as Code
DevOps: Infrastructure as Code
 
DevOps Best Practices
DevOps Best PracticesDevOps Best Practices
DevOps Best Practices
 
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
Devops
DevopsDevops
Devops
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
Continuous Integration With Jenkins
Continuous Integration With JenkinsContinuous Integration With Jenkins
Continuous Integration With Jenkins
 
Devops Devops Devops
Devops Devops DevopsDevops Devops Devops
Devops Devops Devops
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
DevOps 101 - an Introduction to DevOps
DevOps 101  - an Introduction to DevOpsDevOps 101  - an Introduction to DevOps
DevOps 101 - an Introduction to DevOps
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
 

Similar to Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio

Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Databricks
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
Lex Yu
 

Similar to Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio (20)

Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
RAPIDS, GPUs & Python - AWS Community Day Melbourne
RAPIDS, GPUs & Python - AWS Community Day MelbourneRAPIDS, GPUs & Python - AWS Community Day Melbourne
RAPIDS, GPUs & Python - AWS Community Day Melbourne
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
 

More from Alluxio, Inc.

More from Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 

Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio

  • 1. April 27 2021 Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
  • 2. 2 AGENDA 1. Introduction for RAPIDS Accelerator for Spark 2. RAPIDS Accelerator for Spark Performance 3. GPU Acceleration combined with Alluxio
  • 3. 3 GROWTH IN REQUIREMENT FOR DATA PROCESSING 2030 2020 2010 2000 Hadoop Era Spark Era Spark GPU Era Spark 2.0 on CPUs GPU Accelerated Spark 3.0 “These contributions lead to faster data pipelines, model training and scoring for more breakthroughs and insights with Apache Spark 3.0 and Databricks.” Matei Zaharia, creator of Apache Spark and chief technologist at Databricks
  • 4. 4 Accelerate data preparation Quickly move to next stages of the pipeline Focus on most-critical activities Orchestrate end-to-end pipelines From ETL to model training to visualization Same infrastructure for Spark and ML/DL frameworks Complete jobs faster with less hardware Save on-prem and in the cloud Do more with less SPARK 3.0 ON NVIDIA GPUs Accelerate data science pipelines without code changes Faster Execution Time Streamline Analytics to AI Reduced Infrastructure Costs
  • 5. NVIDIA BRINGS GPU ACCELERATION TO APACHE SPARK Features • Use existing (unmodified) customer code • Spark features that are not GPU enabled run transparently on the CPU Initial Release - GPU Acceleration of: • Spark Data Frames • Spark SQL • ML/DL training frameworks Seamless integration with Spark 3.0
  • 6. RAPIDS ACCELERATOR FOR APACHE SPARK  UCX Libraries RAPIDS libcudf (C++ Libraries) CUDA JNI bindings Mapping From Java/Scala to C++ RAPIDS Accelerator for Spark DISTRIBUTED SCALE-OUT SPARK APPLICATIONS Spark SQL API Spark Shuffle DataFrame API if gpu_enabled(operation, data_type) call-out to RAPIDS else execute standard Spark operation JNI bindings Mapping From Java/Scala to C++ ● Custom Implementation of Spark Shuffle ● Optimized to use RDMA and GPU-to-GPU direct communication APACHE SPARK CORE
  • 7. GAME-CHANGING PERFORMANCE GAINS 7x Performance Boost 90% Cost Savings on Databricks Opens new possibilities for AI-driven services in Adobe Experience Cloud “We’re seeing significantly faster performance with NVIDIA-accelerated Spark 3.0 compared to running Spark on CPUs. With these game-changing GPU performance gains, new possibilities open up for enhancing AI-driven features in our full suite of Adobe Experience Cloud apps.” — William Yan, Senior Director of Machine Learning at Adobe
  • 8. RAPIDS ACCELERATOR ECOSYSTEM MOMENTUM   Databricks Machine Learning Runtime Google Cloud Dataproc Apache Spark 3.0 Community Release Amazon EMR Available Now Available Now Available Now Available Now Cloudera CDP Available in Jun’21
  • 9. 9 NVIDIA CONFIDENTIAL - DO NOT COPY OR DISTRIBUTE Nodes 8 CPU 2 x AMD EPYC 7452  (64 cores/128 threads) GPU 2 x NVIDIA Ampere A100, PCIe, 250W, 40GB RAM 0.5 TB Storage 4 x 7.68 TB Gen4 U.2 NVMe Networking 1 x Mellanox CX-6 Single Port HDR100 QSFP56 Cost w/o GPU ~$42,000 per w/ bulk discount Cost w/ GPU ~$71,000 per w/ bulk discount Software HDFS (Hadoop 3.2.1)  Spark 3.0.2 (stand alone) EGX / NVIDIA Certified OEM servers Benchmark Environment – EGX
  • 10. SPARK SQL QUERIES – EGX CLUSTER Based on 97 NVIDIA Decision Support (NDS) benchmark (3TB Dataset without decimals)* GPU is 3.21x faster with a cost ratio of 0.52 (GPU cost was 52% that of the CPU) Queries 14b and 72 were removed because of failures *NVIDIA Decision Support (NDS) benchmark is derived from the TPC-DS benchmark and is used for internal performance testing. Results from NDS are not comparable to TPC-DS
  • 11. UCX ON VS OFF GPU + UCX shuffle is 1.23x faster than the GPU alone. Queries 67 and 72 removed because of failures. GPU + UCX Shuffle is 4.15x faster than the CPU and a cost ratio of 0.41 Queries 14a, 67, and 72 were removed because of failures. SPARK SQL QUERIES – EGX CLUSTER
  • 12. 12 12 Nodes 1 driver (CPU only), 4 workers CPU n1-standard-4 (driver) 4 x n1-standard-32 (workers) GPU 4 x 16GB T4 per executor RAM 120 GB Storage Google Cloud Storage/Alluxio with SSD Networking 32 Gbps Cost w/o GPU $7.82/hour incl GCE + Dataproc Cost w/ GPU $13.41/hour incl GCE + Dataproc Software Dataproc Spark 3.0.1 + YARN Benchmark Environment – GCP DATAPROC
  • 13. 13 *NVIDIA Decision Support (NDS) benchmark is derived from the TPC-DS benchmark and is used for internal performance testing. Results from NDS are not comparable to TPC-DS
  • 14. WHY SOME GPU QUERIES FAILED GPU Memory Limitations/Spilling Operation Problem Solution Sort In cases of data skew, the amount of data being sorted can exceed limits of the hardware/software. A modified external batch sort is implemented in the working branch for the 0.5 release. Window* In cases of data skew the amount of data in the window operation can exceed the limits of the hardware/software. Implement a chunked rank optimization. Github issue #1859 Join Worst case join output row count is left.rows * right.rows Materialize the output of a join in chunks. Github issue #1629 Have conditional filters run as a part of the join. Github issue #288 * Actually, this is for rank which we don’t support yet but plan to in the 0.5 release
  • 15. 15 WHY IS THE GPU SLOWER FOR SOME QUERIES? • Failed Queries • Small Data Sizes (spark.sql.adaptive.advisoryPartitionSizeInBytes=1G) • Q28, Q44, and Q67 • Less computation overlap (spark.rapids.sql.concurrentGpuTasks=1) • Host/Device Memory Transfers • All of them • Cache Consistency on Reductions/Very Small Aggregate Results • Q88 • Lack of GPU support and CPU parallelism is much less • Q44, Q49, and Q67
  • 16. 16 ALLUXIO CONFIGURATION - Co-locate the Alluxio worker nodes with Spark worker nodes to ensure short-circuit reads and writes. - Size cache according to the working set. - Choose the right cache medium choice(SSD or System Memory)
  • 17. 17 spark.rapids.sql.enabled is the master enable spark.rapids.sql.explain enables logging of operations not accelerated - Set to NOT_ON_GPU to print only incompatible ops spark.rapids.sql.concurrentGpuTasks controls concurrent task count per GPU - Set to a value between 2 and 4, with 2 typically providing the most benefit spark.rapids.memory.pinnedPool.size significantly improves performance of data transfers between the GPU and host memory RAPIDS ACCELERATOR CONFIGURATION
  • 18. 18 WILL MY SPARK WORKLOAD ACCELERATE WITHOUT CHANGES? If I know my Spark workload characteristics... Accelerates Well on GPUs Not for GPUs Data Pipeline Use Cases ● Data Mining, Analytics and BI ● Batch processing and writing large datasets to a Data Warehouse ● Data extraction, aggregation and feature preparation for ML Training & Inference ● Real-time Streaming Analytics/AI pipeline ● Online Transaction Processing (OLTP) ● Data Pipeline with custom code Technical Characteristics ● Batch processing of GB+ data sets ● Parquet, ORC, CSV data formats ● HDFS, S3-compatible, or V2 data sources ● DataFrame/SQL (join, agg, sort, window), Selected Hive & Scala UDFs ● Stream processing ● Spark RDD, MLLib, Dataset, GraphX, Streaming libraries If I am unsure... Use the Log-Analysis Tool ● Review Spark history logs from existing CPU jobs ● Understand how much of the workloads could execute on GPUs ● Get tips on optional code optimizations for GPUs Apache Spark Apache Spark - Core Catalyst Query Optimizer Spark Streaming Spark SQL Spark Dataframes Spark Datasets RDD Spark Shuffle Spark MLLib GraphX CPU Only GPU Aware GPU Accelerated Partially GPU Accelerated
  • 19. 19 Summary RAPIDS Accelerator for Spark unlocks GPU acceleration for Spark dataframes, Spark SQL, & ML/DL frameworks such as XGBoost(with more coming) Alluxio is a high performance data orchestration system for GPU compute. Spark & GPUs on Alluxio optimizes for performance and cost on cloud scale datasets. Spark 3 & GPUs on Databricks, EMR and Dataproc available today Try it yourself https://nvidia.github.io/spark-rapids/Getting- Started/ Developer Blog: Accelerating Analytics and AI with Alluxio and NVIDIA GPUs GTC Talk: Enabling Data Orchestration with RAPIDS Accelerator [S32746] Accelerating Apache Spark Shuffle with UCX [S31822] Tuning GPU Network and Memory Usage in Apache Spark [S31566] Running Large-Scale ETL Benchmarks with GPU-Accelerated Apache Spark [S31846] ……. and more