SlideShare a Scribd company logo
1 of 32
Download to read offline
Qi Xie (qi.xie@intel.com)
Hao Cheng (hao.cheng@intel.com)
Quanfu Wang (quanfu.wang@intel.com)
FPGA-BASED ACCELERATION
ARCHITECTURE FOR SPARK SQL
LEGAL NOTICES
• You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning
Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter
drafted which includes subject matter disclosed herein.
• No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
• Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness
for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing,
or usage in trade.
• This document contains information on products, services and/or processes in development. All information provided here is
subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and
roadmaps.
• The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
• Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-
4725 or by visiting www.intel.com/design/literature.htm.
• Intel, the Intel logo, Intel® are trademarks of Intel Corporation in the U.S. and/or other countries.
• *Other names and brands may be claimed as the property of others.
• Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
• Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of
that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
• Copyright © 2017 Intel Corporation.
2
About me
• Software engineer from Intel Big Data Engineering Spark team
• Focused on Spark optimization for Intel Architecture
3
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
4
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
5
What is an FPGA?
• Field Programmable Gate Array
6
‒ Configurable Logic Blocks (CLB)
‒ Embedded Memory
‒ Digital signal processing (DSP) blocks
‒ I/O pads
‒ Hard IP(PCIe, DDR, GigE, etc )
7
Why FPGA?
a
b
c
y
y a b c  
Truth Table
a b c y
0 0 0 1
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 1
1 1 1 1
Programmed LUT
1
0
1
1
1
0
1
1
MUX y
a,b,c
LUT
Required Function
‒ Reconfigurable architecture
CLB consists of LUTs. LUT is a RAM with data width of 1 bit.
The contents are programmed at power up.
‒ Low-power, energy efficiency, compared with CPU/GPU
Extreme degree of customizations, Well positioned for High performance and providing flexibility
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
8
Discrete and Integrated FPGA platforms
9
Intel Accelerator Abstraction Layer(AAL)
10
FPGAHardware
End User Programming Interfaces
11
FPGACPU
User Application
CPU
Infrastructure IP
(UPI, PCIe*, HSSI, FPGA Management)
FPGA Runtime Software
(Accelerator Abstraction Layer)
FPGA IP
(Acceleration
Function Unit)
Intel-Provided
Infrastructure
USER SOFTWARE
INTERFACE
User Developed
Application
Specific
Functions
UPI/PCIe
HSSI
= New blocks that simplify code development.
CORE CACHE
INTERFACE
Intel® Confidential
Traditional FPGA Development Approach
Kernels
exe
AFU
Bitstream
SW
Compiler
OpenCL
Compiler
HDL
SW
Compiler
exe AFU
Bitstream
HDL Programming
Syn.
PAR
AAL
Software
Blue
Bitstream
CPU FPGA
Green
Bitstream
OpenCL
Emulator
Application
Host
AFU
Simulation
Environment
(ASE)
C
OpenCL Programming
ASE
from Intel
AAL
from Intel
Altera® Quartus
Prime Pro
OpenCL BSP
AAL
Software
Blue
Bitstream
Green
Bitstream
Application
CPU FPGA
12
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
13
Workload Introduction
14Intel Confidential
The test case is from a customer and it utilizes SQL query to get the accounting summaries by
USER_ID on a big table. The SQL query contains heavy expression evaluations.
Accounting Big Table:
TIME_ID MBUSER_ID OPER_TID SUM_TIMES CHARGE1 …
20140407 2700007679977 5B013363363w 3 0 …
20140407 2704012998344 31011G13iG0 48 57180 …
20140407 2704040114238 31Q11512ZT0 1 180 …
20140407 2700007012466 31011G13iG0 8 52320 …
20140407 2700001523491 1T0311G80610ydH10G00 2 0 …
20140407 2700000765632 310103015G0 1 30 …
20140407 2700007800325 4562210021 1 0 …
…
1.6x10^8
Rows
38 Columns
 SQL queries to summarize customers consumption characteristics utilizing
billing data.
 5GB parquet format stored on HDFS, 160 Million rows.
Workload code snippet
Function Count
Max 13
Sum 155
Substr 329
Case 133
Implicit Data type cast (String to Double) n/a
Total 630
// Prepare
val parquet = spark.read.parquet ("/mnt/nvme/inputParquet/")
parquet.createOrReplaceTempView ("inputTable")
// Query
A very Long SQL statement, intensive use build-in functions:
16Intel Confidential
SQL Query Physical Execution Plan
Two stages and with a shuffle(cross the data in network), the map stage contains file scan, projection and
partial aggregation while the reduce stage do further aggregation by merging the partial aggregation results.
Stage 1 (Map)
• File Scan
Read data from source.
• Projection
Expression evaluation
consumes most CPU cycles.
• Partial Aggregation
Aggregate per partition.
Shuffle
Stage 2 (Reduce)
• Full Aggregation. Tiny
task, consumes minor
CPU cycles.
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
17
Benchmark H/W Setup
18Intel Confidential
In a single server for profiling and performance evaluation.
• MCP(Skylake-FPGA Multiple Chips Package)
o CPU
Intel Xeon Skylake-P, 2Socketsx14Cores@2.8GHz, 56Hyper Threads
o FPGA
1xArria10 GX, 427,200ALM, 8MB RAM (10AX115U3F45E2SG)
o DMA Channels
1xUPI (80Gbps)
• Memory
384GB, DDR4@2133 MHz
• Disk
1xIntel SSD P3700, 1.6TB, SR:2800MB/s, SW:1900MB/s, RR:450K IOPS, RW:150K IOPS
19Intel Confidential
Baseline Profile - CPU, The Bottleneck
• PAT(Performance Analysis Tool) shows CPU is heavily utilized (assigned 54/56 Virtual Cores to
Spark). The total query execution time is 85 seconds.
Note: We started measurement from the 2nd run(the 1st run is to warm up data Linux file system cache), so no disk access
bandwidth in general.
Reduce
Stage does
very simple
aggregation
and takes
minor
CPU.(~1s)
*For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
20Intel Confidential
Baseline Profile - CPU, The Bottleneck, Contd.
• From the VisualVM map task’s CPU breakdown we can see the projection consumes 66.7% CPU.
Projection takes
66.7% of CPU
*For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
21
Arch Overview – Typical SQL Query Operators
2121aaIntel Confidential
This POC Target
JVM
Spark
Spark FPGA Adaptor
Native
HW
InternalRow
to FPGA Batch
FPGA Batch
to InternalRow
FPGA Java Wrapper
FPGA Driver
FPGA
FPGA Project
Pattern
Configure
DMA
Configure
Huge Page
Memory Pool
Computation
Starter, Monitor
Java Native Interface(JNI)
Accelerator Abstraction Layer (AAL)
• Spark FPGA Adaptor
• Identify the expressions in projection and
export to FPGA SQL engine instructions
• Data conversions between Spark Internal
Rows  FPGA Batches.
• FPGA Driver
• Configure the SQL Engine Patterns according
to the instructions from Spark FPGA Adaptor
• Trigger the FPGA computation and collect
results
• Huge pages memory management
• Configure the DMA channel between main
memory & FPGA
• AAL
• FPGA runtime library
• low level API to FPGA Driver
• FPGA SQL Engine (RTL)
• SQL expression pattern units, can be
configurable.
• DMA RX: FPGA reads input data from main
memory.
• DMA TX: FPGA writes results to main memory.
23
Arch Overview - S/W Stack
DMA RX/TX(RTL) SQL Engine(RTL)
null bit set(1 bit/field) values(8 bytes/field) variable length portion
4 bytes(TIME_ID) …… 64 bytes(For 4xCL alignment)
…… 4 bytes(For 4xCL alignment)8 bytes(MBUSER_ID)
8 bytes(MBUSER_ID)FPGA Input Batch
FPGA Output Batch
Internal Row
InternalRow
to FPGA Batch
FPGA Batch
to InternalRow
FPGA Java Wrapper
FPGA Project
1. Get HugePage
wrapped in
DirectByteBuffer
Internal Rows
FPGAInputBatch
FPGAOutputBatch
Internal Rows
4. Input
for
Computation
5. Collect
computation
result
2. Data Conversion 6. Data Conversion
7. Free HugePage
wrapped in
DirectByteBuffer
• Internal Row
Spark representation of one record, flexible to represent fixed and variable length fields.
• FPGA Input Batch
For memory and computation efficiency fields are placed in a sequential physical memory.
• FPGA Output Batch
Similar as FPGA Input Batch.
3. Engine
Configuration,
Start
24
Arch Overview - S/W Stack, Contd.
12 bytes(ACC_NBR)
Input Output
Data Flow
Control Flow
Spark FPGA Adaptor
Spark
Engine
Unit
Engine
Unit
Engine
Unit …DMA
RX
DMA
TX
Output BufferInput Buffer
CPU
FPGA
FPGA
Adapter & Driver
Data Source
Input BufferInput Buffer
Output BufferOutput Buffer
Engine
Unit
Engine
Unit
Engine
Unit …
Engine
Unit
Engine
Unit
Engine
Unit …
169 Levels Pipeline
Data Flow
Control Flow
Pattern Configure,
Computation Control
25
Arch Overview - Engine Pipeline, Data Flow
• Engine Pipeline
Spark FPGA SQL Engine is designed as Engine Unit Pipelines. Every Engine Unit plays a single computation, different Engine Units are
assembled together(configured by Spark) to perform a complex computation and works in the way of pipeline. A lot of pipelines(say N
pipelines) can be constructed to perform N parallel computations, so that in a single FPGA cycle, N records can be digested.
• Data Flow
Spark pumps Data from Data Source and converts them into the format as FPGA required, and then put them into InputBuffer Array.
Then FPGA gets input data via DMA RX and feed them into Engine Pipelines. The results of Engine Pipelines are filled into OutBuffer
Array via DMA TX. Finally Spark converts data back in the format of Spark SQL needed.
Arch Overview – SQL Engine Micro Architecture
26
• Every SQL Expression Evaluation engine is configurable.
• Every engine contain max four pattern engines. The input data is parallel fed into
pattern engine. The final result is the combine of the pattern engine result.
 Pattern Engine 1 is configured to
evaluate the SQL expression
Substr(oper_tid,1,1) IN (‘1’, ‘7’)
 Pattern Engine 2 is configured to
evaluate the SQL expression
Substr(oper_tid, 2, 1) IN (‘o’)
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
27
• The FPGA accelerated version significantly reduced the total execution time, from 86
seconds(baseline) to 44 seconds in the end to end benchmark.
Speedup Ratio: 86s/44s => ~2X
FPGA: 44s
Baseline: 86s
Performance Comparison - FPGA vs Baseline
28
*For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
• The FPGA accelerated version reduced the CPU time in expression evaluation,
from 66.7%(baseline) to 6.6-% in Map stage.
Projection with FPGA, less
than 6.6%
Projection in Baseline,
66.7%
29
Performance Comparison - FPGA vs Baseline, Contd.
*For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Outline
• What’s an FPGA
• Intel FPGA Platform
• Workload & Benchmark Introduction
• Baseline Profile - Hotspot Analysis
• FPGA Acceleration Arch Overview
• Performance Comparison
• Future Works
30
Future Works
• Fully Configurable FPGA SQL Acceleration Engine
• In this PoC, we identified the SQL expression patterns manually in
frontend and configure them to the FPGA SQL Engine units in
runtime; however, we have limit FPGA SQL engines to support
some of the typical expression patterns, and arbitrary SQL
expression combinations is not supported yet.
• More Operators Support
• SQL Expression Evaluation in Projection is the first step, and for the
other typical operators like Aggregation/Sort/Join probably also can
be offload to FPGA.
• CPU can also computes the expression evaluation when FPGA
resources are fully occupied in computation.
31
qi.xie@intel.com
hao.cheng@intel.com
quanfu.wang@intel.com
Thank You

More Related Content

What's hot

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!Databricks
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovMaksud Ibrahimov
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDatabricks
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkPatrick Wendell
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slidesDat Tran
 
Improving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInImproving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInDatabricks
 
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing ShuffleBucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing ShuffleDatabricks
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 

What's hot (20)

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slides
 
Improving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInImproving Spark SQL at LinkedIn
Improving Spark SQL at LinkedIn
 
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing ShuffleBucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 

Similar to FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Databricks
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Community
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryDatabricks
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...Andrey Kudryavtsev
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production EnvironmentsIntel® Software
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon InnovationPedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon InnovationJen Aman
 
FPGA Overview
FPGA OverviewFPGA Overview
FPGA OverviewMetalMath
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chipinside-BigData.com
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...Databricks
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryDeepak Shankar
 
Implementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGAImplementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGADeepak Kumar
 
FPGA Design Challenges
FPGA Design ChallengesFPGA Design Challenges
FPGA Design ChallengesKrishna Gaihre
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesJun Liu
 
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning AcceleratorDeep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning Acceleratorinside-BigData.com
 

Similar to FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang (20)

Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon InnovationPedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon Innovation
 
FPGA Overview
FPGA OverviewFPGA Overview
FPGA Overview
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
Implementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGAImplementation of Soft-core Processor on FPGA
Implementation of Soft-core Processor on FPGA
 
FPGA Design Challenges
FPGA Design ChallengesFPGA Design Challenges
FPGA Design Challenges
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
 
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning AcceleratorDeep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
 
nios.ppt
nios.pptnios.ppt
nios.ppt
 

More from Spark Summit

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Spark Summit
 

More from Spark Summit (20)

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 

Recently uploaded

9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 

Recently uploaded (20)

9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

  • 1. Qi Xie (qi.xie@intel.com) Hao Cheng (hao.cheng@intel.com) Quanfu Wang (quanfu.wang@intel.com) FPGA-BASED ACCELERATION ARCHITECTURE FOR SPARK SQL
  • 2. LEGAL NOTICES • You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. • No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. • Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. • This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. • The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. • Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548- 4725 or by visiting www.intel.com/design/literature.htm. • Intel, the Intel logo, Intel® are trademarks of Intel Corporation in the U.S. and/or other countries. • *Other names and brands may be claimed as the property of others. • Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. • Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. • Copyright © 2017 Intel Corporation. 2
  • 3. About me • Software engineer from Intel Big Data Engineering Spark team • Focused on Spark optimization for Intel Architecture 3
  • 4. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 4
  • 5. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 5
  • 6. What is an FPGA? • Field Programmable Gate Array 6 ‒ Configurable Logic Blocks (CLB) ‒ Embedded Memory ‒ Digital signal processing (DSP) blocks ‒ I/O pads ‒ Hard IP(PCIe, DDR, GigE, etc )
  • 7. 7 Why FPGA? a b c y y a b c   Truth Table a b c y 0 0 0 1 0 0 1 0 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 1 1 1 1 1 Programmed LUT 1 0 1 1 1 0 1 1 MUX y a,b,c LUT Required Function ‒ Reconfigurable architecture CLB consists of LUTs. LUT is a RAM with data width of 1 bit. The contents are programmed at power up. ‒ Low-power, energy efficiency, compared with CPU/GPU Extreme degree of customizations, Well positioned for High performance and providing flexibility
  • 8. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 8
  • 9. Discrete and Integrated FPGA platforms 9
  • 10. Intel Accelerator Abstraction Layer(AAL) 10 FPGAHardware
  • 11. End User Programming Interfaces 11 FPGACPU User Application CPU Infrastructure IP (UPI, PCIe*, HSSI, FPGA Management) FPGA Runtime Software (Accelerator Abstraction Layer) FPGA IP (Acceleration Function Unit) Intel-Provided Infrastructure USER SOFTWARE INTERFACE User Developed Application Specific Functions UPI/PCIe HSSI = New blocks that simplify code development. CORE CACHE INTERFACE Intel® Confidential
  • 12. Traditional FPGA Development Approach Kernels exe AFU Bitstream SW Compiler OpenCL Compiler HDL SW Compiler exe AFU Bitstream HDL Programming Syn. PAR AAL Software Blue Bitstream CPU FPGA Green Bitstream OpenCL Emulator Application Host AFU Simulation Environment (ASE) C OpenCL Programming ASE from Intel AAL from Intel Altera® Quartus Prime Pro OpenCL BSP AAL Software Blue Bitstream Green Bitstream Application CPU FPGA 12
  • 13. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 13
  • 14. Workload Introduction 14Intel Confidential The test case is from a customer and it utilizes SQL query to get the accounting summaries by USER_ID on a big table. The SQL query contains heavy expression evaluations. Accounting Big Table: TIME_ID MBUSER_ID OPER_TID SUM_TIMES CHARGE1 … 20140407 2700007679977 5B013363363w 3 0 … 20140407 2704012998344 31011G13iG0 48 57180 … 20140407 2704040114238 31Q11512ZT0 1 180 … 20140407 2700007012466 31011G13iG0 8 52320 … 20140407 2700001523491 1T0311G80610ydH10G00 2 0 … 20140407 2700000765632 310103015G0 1 30 … 20140407 2700007800325 4562210021 1 0 … … 1.6x10^8 Rows 38 Columns  SQL queries to summarize customers consumption characteristics utilizing billing data.  5GB parquet format stored on HDFS, 160 Million rows.
  • 15. Workload code snippet Function Count Max 13 Sum 155 Substr 329 Case 133 Implicit Data type cast (String to Double) n/a Total 630 // Prepare val parquet = spark.read.parquet ("/mnt/nvme/inputParquet/") parquet.createOrReplaceTempView ("inputTable") // Query A very Long SQL statement, intensive use build-in functions:
  • 16. 16Intel Confidential SQL Query Physical Execution Plan Two stages and with a shuffle(cross the data in network), the map stage contains file scan, projection and partial aggregation while the reduce stage do further aggregation by merging the partial aggregation results. Stage 1 (Map) • File Scan Read data from source. • Projection Expression evaluation consumes most CPU cycles. • Partial Aggregation Aggregate per partition. Shuffle Stage 2 (Reduce) • Full Aggregation. Tiny task, consumes minor CPU cycles.
  • 17. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 17
  • 18. Benchmark H/W Setup 18Intel Confidential In a single server for profiling and performance evaluation. • MCP(Skylake-FPGA Multiple Chips Package) o CPU Intel Xeon Skylake-P, 2Socketsx14Cores@2.8GHz, 56Hyper Threads o FPGA 1xArria10 GX, 427,200ALM, 8MB RAM (10AX115U3F45E2SG) o DMA Channels 1xUPI (80Gbps) • Memory 384GB, DDR4@2133 MHz • Disk 1xIntel SSD P3700, 1.6TB, SR:2800MB/s, SW:1900MB/s, RR:450K IOPS, RW:150K IOPS
  • 19. 19Intel Confidential Baseline Profile - CPU, The Bottleneck • PAT(Performance Analysis Tool) shows CPU is heavily utilized (assigned 54/56 Virtual Cores to Spark). The total query execution time is 85 seconds. Note: We started measurement from the 2nd run(the 1st run is to warm up data Linux file system cache), so no disk access bandwidth in general. Reduce Stage does very simple aggregation and takes minor CPU.(~1s) *For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 20. 20Intel Confidential Baseline Profile - CPU, The Bottleneck, Contd. • From the VisualVM map task’s CPU breakdown we can see the projection consumes 66.7% CPU. Projection takes 66.7% of CPU *For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 21. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 21
  • 22. Arch Overview – Typical SQL Query Operators 2121aaIntel Confidential This POC Target
  • 23. JVM Spark Spark FPGA Adaptor Native HW InternalRow to FPGA Batch FPGA Batch to InternalRow FPGA Java Wrapper FPGA Driver FPGA FPGA Project Pattern Configure DMA Configure Huge Page Memory Pool Computation Starter, Monitor Java Native Interface(JNI) Accelerator Abstraction Layer (AAL) • Spark FPGA Adaptor • Identify the expressions in projection and export to FPGA SQL engine instructions • Data conversions between Spark Internal Rows  FPGA Batches. • FPGA Driver • Configure the SQL Engine Patterns according to the instructions from Spark FPGA Adaptor • Trigger the FPGA computation and collect results • Huge pages memory management • Configure the DMA channel between main memory & FPGA • AAL • FPGA runtime library • low level API to FPGA Driver • FPGA SQL Engine (RTL) • SQL expression pattern units, can be configurable. • DMA RX: FPGA reads input data from main memory. • DMA TX: FPGA writes results to main memory. 23 Arch Overview - S/W Stack DMA RX/TX(RTL) SQL Engine(RTL)
  • 24. null bit set(1 bit/field) values(8 bytes/field) variable length portion 4 bytes(TIME_ID) …… 64 bytes(For 4xCL alignment) …… 4 bytes(For 4xCL alignment)8 bytes(MBUSER_ID) 8 bytes(MBUSER_ID)FPGA Input Batch FPGA Output Batch Internal Row InternalRow to FPGA Batch FPGA Batch to InternalRow FPGA Java Wrapper FPGA Project 1. Get HugePage wrapped in DirectByteBuffer Internal Rows FPGAInputBatch FPGAOutputBatch Internal Rows 4. Input for Computation 5. Collect computation result 2. Data Conversion 6. Data Conversion 7. Free HugePage wrapped in DirectByteBuffer • Internal Row Spark representation of one record, flexible to represent fixed and variable length fields. • FPGA Input Batch For memory and computation efficiency fields are placed in a sequential physical memory. • FPGA Output Batch Similar as FPGA Input Batch. 3. Engine Configuration, Start 24 Arch Overview - S/W Stack, Contd. 12 bytes(ACC_NBR) Input Output Data Flow Control Flow Spark FPGA Adaptor
  • 25. Spark Engine Unit Engine Unit Engine Unit …DMA RX DMA TX Output BufferInput Buffer CPU FPGA FPGA Adapter & Driver Data Source Input BufferInput Buffer Output BufferOutput Buffer Engine Unit Engine Unit Engine Unit … Engine Unit Engine Unit Engine Unit … 169 Levels Pipeline Data Flow Control Flow Pattern Configure, Computation Control 25 Arch Overview - Engine Pipeline, Data Flow • Engine Pipeline Spark FPGA SQL Engine is designed as Engine Unit Pipelines. Every Engine Unit plays a single computation, different Engine Units are assembled together(configured by Spark) to perform a complex computation and works in the way of pipeline. A lot of pipelines(say N pipelines) can be constructed to perform N parallel computations, so that in a single FPGA cycle, N records can be digested. • Data Flow Spark pumps Data from Data Source and converts them into the format as FPGA required, and then put them into InputBuffer Array. Then FPGA gets input data via DMA RX and feed them into Engine Pipelines. The results of Engine Pipelines are filled into OutBuffer Array via DMA TX. Finally Spark converts data back in the format of Spark SQL needed.
  • 26. Arch Overview – SQL Engine Micro Architecture 26 • Every SQL Expression Evaluation engine is configurable. • Every engine contain max four pattern engines. The input data is parallel fed into pattern engine. The final result is the combine of the pattern engine result.  Pattern Engine 1 is configured to evaluate the SQL expression Substr(oper_tid,1,1) IN (‘1’, ‘7’)  Pattern Engine 2 is configured to evaluate the SQL expression Substr(oper_tid, 2, 1) IN (‘o’)
  • 27. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 27
  • 28. • The FPGA accelerated version significantly reduced the total execution time, from 86 seconds(baseline) to 44 seconds in the end to end benchmark. Speedup Ratio: 86s/44s => ~2X FPGA: 44s Baseline: 86s Performance Comparison - FPGA vs Baseline 28 *For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 29. • The FPGA accelerated version reduced the CPU time in expression evaluation, from 66.7%(baseline) to 6.6-% in Map stage. Projection with FPGA, less than 6.6% Projection in Baseline, 66.7% 29 Performance Comparison - FPGA vs Baseline, Contd. *For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 30. Outline • What’s an FPGA • Intel FPGA Platform • Workload & Benchmark Introduction • Baseline Profile - Hotspot Analysis • FPGA Acceleration Arch Overview • Performance Comparison • Future Works 30
  • 31. Future Works • Fully Configurable FPGA SQL Acceleration Engine • In this PoC, we identified the SQL expression patterns manually in frontend and configure them to the FPGA SQL Engine units in runtime; however, we have limit FPGA SQL engines to support some of the typical expression patterns, and arbitrary SQL expression combinations is not supported yet. • More Operators Support • SQL Expression Evaluation in Projection is the first step, and for the other typical operators like Aggregation/Sort/Join probably also can be offload to FPGA. • CPU can also computes the expression evaluation when FPGA resources are fully occupied in computation. 31