SlideShare a Scribd company logo
1 of 26
Download to read offline
© 2015 IBM Corporation
Five cool ways the JVM can run Apache Spark faster
Tim Ellison
IBM Runtime Technology Center, Hursley, UK
tellison
@tpellison
© 2015 IBM Corporation
What's so cool about IBM's Java implementation?
Reference
Java
Technology
OpenJDK,
etc
IBM
Java
IBM Runtime
Technology
Centre
Quality
Engineering
Performance
Security
Reliability
Scalability
Serviceability
Production
Requirements
Strategic Software
Hardware exploits
User Feedback
 A certified Java, optimized to run on wide
variety of hardware platforms
 Performance tuned for key software workloads
 World class service and support
http://www.ibm.com/developerworks/java/jdk/index.html
Java Platform APIs
AWT Swing Java2d
XML Crypto Serializer
Core Libraries
Virtual Machine
Intel / POWER / z Series / ARM
Spark Backend
Spark APIs
Scala
© 2015 IBM Corporation3
Five cool things you and I can do with the JVM to enhance the
performance of Apache Spark!
1) JIT compiler enhancements and writing JIT-friendly code
2) Improving the object serializer
3) Faster IO – networking and storage
4) Data access acceleration & auto SIMD
5) Offloading tasks to graphics co-processors (GPUs)
© 2015 IBM Corporation4
Five cool things you and I can do with the JVM to enhance the
performance of Apache Spark!
1) JIT compiler enhancements and writing JIT-friendly code
2) Improving the object serializer
3) Faster IO – networking and storage
4) Data access acceleration & auto SIMD
5) Offloading tasks to graphics co-processors (GPUs)
input/output
bound jobs
© 2015 IBM Corporation5
Five cool things you and I can do with the JVM to enhance the
performance of Apache Spark!
1) JIT compiler enhancements and writing JIT-friendly code
2) Improving the object serializer
3) Faster IO – networking and storage
4) Data access acceleration & auto SIMD
5) Offloading tasks to graphics co-processors (GPUs)
processing
bound jobs
© 2015 IBM Corporation6
JIT compiler enhancements, and writing JIT-friendly code
© 2015 IBM Corporation7
Help yourself. JNI calls are not free!
https://github.com/xerial/snappy­java/blob/develop/src/main/java/org/xerial/snappy/SnappyNative.cpp
© 2015 IBM Corporation8
Style: Using JNI has an impact...
 The cost of calling from Java code to natives and from natives to Java code is significantly
higher (maybe 5x longer) than a normal Java method call.
– The JIT can't in-line native methods.
– The JIT can't do data flow analysis into JNI calls
• e.g. it has to assume that all parameters always used.
– The JIT has to set up the call stack and parameters for C calling convention,
• i.e. maybe rearranging items on the stack.
 JNI can introduce additional data copying costs
– There's no guarantee that you will get a direct pointer to the array / string with
Get<type>ArrayElements(), even when using the GetPrimitiveArrayCritical
versions.
– The IBM JVM will always return a copy (to allow GC to continue).
 Tip:
– JNI natives are more expensive than plain Java calls.
– e.g. create an unsafe based Snappy-like package written in Java code so that JNI cost is
eliminated.
© 2015 IBM Corporation9
Style: Use JIT optimizations to reduce overhead of logging checks
 Tip: Check for the non-null value of a static field ref to instance of a logging class singleton
– e.g.
– Uses the JIT's speculative optimization to avoid the explicit test for logging being enabled;
instead it ...
1)Generates an internal JIT runtime assumption (e.g. InfoLogger.class is undefined),
2)NOPs the test for trace enablement
3)Uses a class initialization hook for the InfoLogger.class (already necessary for instantiating the class)
4)The JIT will regenerate the test code if the class event is fired
 Spark's logging calls are gated on the checks of a static boolean value
trait Logging
Spark
© 2015 IBM Corporation10
Style: Judicious use of polymorphism
 Spark has a number of highly polymorphic interface call sites and high fan-in (several calling contexts
invoking the same callee method) in map, reduce, filter, flatMap, ...
– e.g. ExternalSorter.insertAll is very hot (drains an iterator using hasNext/next calls)
 Pattern #1:
– InterruptibleIterator → Scala's mapIterator → Scala's filterIterator → …
 Pattern #2:
– InterruptibleIterator → Scala's filterIterator → Scala's mapIterator → …
 The JIT can only choose one pattern to in-line!
– Makes JIT devirtualization and speculation more risky; using profiling information from a different
context could lead to incorrect devirtualization.
– More conservative speculation, or good phase change detection and recovery are needed in the JIT
compiler to avoid getting it wrong.
 Lambdas and functions as arguments, by definition, introduce different code flow targets
– Passing in widely implemented interfaces produce many different bytecode sequences
– When we in-line we have to put runtime checks ahead of in-lined method bodies to make sure we are
going to run the right method!
– Often specialized classes are used only in a very limited number of places, but the majority of the code
does not use these classes and pays a heavy penalty
– e.g. Scala's attempt to specialize Tuple2 Int argument does more harm than good!
 Tip: Use polymorphism sparingly, use the same order / patterns for nested & wrappered code, and
keep call sites homogeneous.
© 2015 IBM Corporation11
Replacing the object serializer
© 2015 IBM Corporation12
Writing a Spark-friendly object serializer
 Expand the list of specialist serialized form
– Having custom write/read object methods allows for reduced time in reflection and
smaller on-wire payloads.
– Types such as Tuple and Some given special treatment in the serializer
 Reduce the payload size further using variable length encoding of primitive types.
– All objects are eventually decomposed into primitives
 Adaptive stack-based recursive serialization vs. state machine serialization
– Use the stack to track state wherever possible, but fall back to state machine for deeply
nested objects (e.g. big RDDs)
 Sharing object representation within the serialized stream to reduce payload
– But may be defeated if supportsRelocationOfSerializedObjects required
 Special replacement of deserialization calls to avoid stack-walking to find class loader
context
– Optimization in JIT to circumvent some regular calls to more efficient versions
 Tip: These are opaque to the application, no special patterns required.
© 2015 IBM Corporation13
Faster IO – networking and storage
© 2015 IBM Corporation
Remote Direct Memory Access (RDMA) Networking
Spark VM
Buffer
Off
Heap
Buffer
Spark VM
Buffer
Off
Heap
Buffer
Ether/IB
SwitchRDMA NIC/HCA RDMA NIC/HCA
OS OS
DMA DMA
(Z-Copy) (Z-Copy)
(B-Copy)(B-Copy)
Acronyms:
Z-Copy – Zero Copy
B-Copy – Buffer Copy
IB – InfiniBand
Ether - Ethernet
NIC – Network Interface Card
HCA – Host Control Adapter
●
Low-latency, high-throughput networking
●
Direct 'application to application' memory pointer exchange between remote hosts
●
Off-load network processing to RDMA NIC/HCA – OS/Kernel Bypass (zero-copy)
●
Introduces new IO characteristics that can influence the Spark transfer plan
Spark node #1 Spark node #2
© 2015 IBM Corporation
Faster network IO with RDMA-enabled Spark
15
New dynamic transfer plan that adapts to the load and
responsiveness of the remote hosts.
New “RDMA” shuffle IO mode with lower latency and
higher throughput.
JVM-agnostic
IBM JVM only
JVM-agnostic
IBM JVM only
IBM JVM only
Block manipulation (i.e., RDD partitions)
High-level API
JVM-agnostic working prototype
with RDMA
© 2015 IBM Corporation16
Faster storage with POWER CAPI/Flash
 POWER8 architecture offers a 40Tb Flash drive attached
via Coherent Accelerator Processor Interface (CAPI)
– Provides simple coherent block IO APIs
– No file system overhead
 Power Service Layer (PSL)
– Performs Address Translations
– Maintains Cache
– Simple, but powerful interface to the Accelerator unit
 Coherent Accelerator Processor Proxy (CAPP)
– Maintains directory of cache lines held by Accelerator
– Snoops PowerBus on behalf of Accelerator

© 2015 IBM Corporation17
Faster disk IO with CAPI/Flash-enabled Spark
 When under memory pressure, Spark spills RDDs to disk.
– Happens in ExternalAppendOnlyMap and ExternalSorter
 We have modified Spark to spill to the high-bandwidth, coherently-attached
Flash device instead.
– Replacement for DiskBlockManager
– New FlashBlockManager handles spill to/from flash
 Making this pluggable requires some further abstraction in Spark:
– Spill code assumes using disks, and depends on DiskBlockManger
– We are spilling without using a file system layer
 Dramatically improves performance of executors under memory pressure.
 Allows to reach similar performance with much less memory (denser
deployments).
IBM Flash System 840Power8 + CAPI
© 2015 IBM Corporation18
Data Access Acceleration & auto SIMD
© 2015 IBM Corporation19
Data Access Accelerators (DAA)
 Provides JIT-recognized operations directly to/from byte arrays.
– Think of it as org.apache.spark.unsafe.Platform APIs on steroids!
 Optimized data conversion, arithmetic, etc on native format data records and types.
– Avoids object creation, initialization, data copying, abstraction, boxing, etc
– Range of operations on Packed Decimals; BigDecimal and BigInteger conversions from and
to external decimal and Unicode decimal types; marshalling and unmarshalling primitive
types (short, int, long, float, and double) to and from byte arrays
– e.g.
Benefits:
Expose hardware acceleration in a platform and JVM-neutral manner (2 – 100x speed-up)
Can provide significant speed-up to record parsing, data marshalling and inter-language
communication
 Tip: Don't be limited to current set of sun.misc.Unsafe APIs. Use existing DAA APIs, or
write equivalent places in Spark we can recognize.
© 2015 IBM Corporation20
Data Access Accelerator – example
Problem: there is no intrinsic representation of a “packed decimal” in Java
Solution: trivially add two packed decimals by converting to BigDecimal:
Step 1: convert from byte[] to BigDecimal
Step 2 : add BigDecimals together
Step 3: convert BigDecimal result back to byte[]
IBM Java provides a helper method to do this operation for you:
Old Approach:
byte[] addPacked(byte[] a, byte[] b) {
BigDecimal a_bd = convertPackedToBd(a[]);
BigDecimal b_bd = convertPackedToBd(b[]);
a_bd.add(b_bd);
return (convertBDtoPacked(a_bd));
}
BigDecimal convertPackedToBd(byte[] myBytes) {
… ~30 lines …
}
New Approach:
byte[] addPacked(byte[] a, byte[] b) {
DAA.addPacked(a[], b[]);
return (a[]);
}
e.g. you wish to add two packed decimal numbers
p.s. zSeries hardware has packed decimal machine instructions, e.g. AP (AddPacked)
On IBM Java 7 Release 1, “DAA.addPacked(a,b)” is compiled to this one instruction!
http://www­01.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.lnx.80.doc/user/daa.html
© 2015 IBM Corporation
Automatic Vector Processing: SIMD – Single Instruction Multiple Data
The IBM JVM can operate on multiple data-elements (vectors) simultaneously
 Can offer dramatic speed-up to data-parallel operations (matrix ops, string processing, etc)
21
Vector registers are 128-bits wide and can be used
to operate concurrently on:
• Two 64-bit floating point values
• Four 32-bit floating point values
• Sixteen 8-bit characters
• Two 64-bit integer values
• etc
© 2015 IBM Corporation22
Offloading tasks to graphics co-processors
© 2015 IBM Corporation23
GPU-enabled array sort method
IBM Power 8 with Nvidia K40m GPU
 Some Arrays.sort() methods will offload work to GPUs today
– e.g. sorting large arrays of ints
© 2015 IBM Corporation24
GPU optimization of Lambda expressions
Speed-up factor when run on a GPU enabled host
IBM Power 8 with Nvidia K40m GPU
0.00
0.01
0.10
1.00
10.00
100.00
1000.00
auto-SIMD parallel forEach on CPU
parallel forEach on GPU
matrix size
The JIT can recognize parallel stream
code, and automatically compile down to
the GPU.
© 2015 IBM Corporation
GPU off-loading of high-level ML functions
© 2015 IBM Corporation26
Summary
 All the changes are being made in the Java runtime or being pushed out to the Spark
community.
 We are focused on Core performance to get a multiplier up the Spark stack.
– More efficient code, more efficient memory usage/spilling, more efficient serialization &
networking.
– Need to take a look at bytecode codegen changes coming in via Tungsten
 There are hardware and software technologies we can bring to the party.
– We can tune the stack from hardware to high level structures for running Spark.
 Spark and Scala developers can help themselves by their style of coding.
 There is lots more stuff I don't have time to talk about, like GC optimizations, object layout,
monitoring VM/Spark events, hardware compression, security, etc. etc.

More Related Content

What's hot

CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
 
AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group Ganesan Narayanasamy
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuningJerry Kurian
 
IBM Monitoring and Diagnostic Tools - GCMV 2.8
IBM Monitoring and Diagnostic Tools - GCMV 2.8IBM Monitoring and Diagnostic Tools - GCMV 2.8
IBM Monitoring and Diagnostic Tools - GCMV 2.8Chris Bailey
 
O'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud EconomicsO'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud EconomicsChris Bailey
 
Solaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningSolaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningAdrian Cockcroft
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory ModelHSA Foundation
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Indrajit Poddar
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAlluxio, Inc.
 
Accelerating apache spark with rdma
Accelerating apache spark with rdmaAccelerating apache spark with rdma
Accelerating apache spark with rdmainside-BigData.com
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?Azul Systems Inc.
 
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...Benoit Hudzia
 
Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
 
Oracle Cloud - Infrastruktura jako kód
Oracle Cloud - Infrastruktura jako kódOracle Cloud - Infrastruktura jako kód
Oracle Cloud - Infrastruktura jako kódMarketingArrowECS_CZ
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareIndicThreads
 

What's hot (20)

Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 
2018 bsc power9 and power ai
2018   bsc power9 and power ai 2018   bsc power9 and power ai
2018 bsc power9 and power ai
 
AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
IBM Monitoring and Diagnostic Tools - GCMV 2.8
IBM Monitoring and Diagnostic Tools - GCMV 2.8IBM Monitoring and Diagnostic Tools - GCMV 2.8
IBM Monitoring and Diagnostic Tools - GCMV 2.8
 
O'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud EconomicsO'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud Economics
 
Solaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningSolaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and Tuning
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory Model
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 
Accelerating apache spark with rdma
Accelerating apache spark with rdmaAccelerating apache spark with rdma
Accelerating apache spark with rdma
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?
 
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
 
Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Oracle Cloud - Infrastruktura jako kód
Oracle Cloud - Infrastruktura jako kódOracle Cloud - Infrastruktura jako kód
Oracle Cloud - Infrastruktura jako kód
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 

Viewers also liked

Apache Sparkについて
Apache SparkについてApache Sparkについて
Apache SparkについてBrainPad Inc.
 
JVM上でのストリーム処理エンジンの変遷
JVM上でのストリーム処理エンジンの変遷JVM上でのストリーム処理エンジンの変遷
JVM上でのストリーム処理エンジンの変遷Sotaro Kimura
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.J On The Beach
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovZeroTurnaround
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?Doug Hawkins
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySparkSpark Summit
 
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...Databricks
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkDatabricks
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache SparkWes McKinney
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
 

Viewers also liked (10)

Apache Sparkについて
Apache SparkについてApache Sparkについて
Apache Sparkについて
 
JVM上でのストリーム処理エンジンの変遷
JVM上でのストリーム処理エンジンの変遷JVM上でのストリーム処理エンジンの変遷
JVM上でのストリーム処理エンジンの変遷
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 

Similar to Five cool ways the JVM can run Apache Spark faster

A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceTim Ellison
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016Tim Ellison
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance ObservationsAdam Roberts
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkAdamRobertsIBM
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache SparkSachin Aggarwal
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Spark Summit
 
FOSDEM 2017 - Open J9 The Next Free Java VM
FOSDEM 2017 - Open J9 The Next Free Java VMFOSDEM 2017 - Open J9 The Next Free Java VM
FOSDEM 2017 - Open J9 The Next Free Java VMCharlie Gracie
 
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsIBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsSatya Narayan
 
How Java 19 Influences the Future of Your High-Scale Applications .pdf
How Java 19 Influences the Future of Your High-Scale Applications .pdfHow Java 19 Influences the Future of Your High-Scale Applications .pdf
How Java 19 Influences the Future of Your High-Scale Applications .pdfAna-Maria Mihalceanu
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkMichelle Holley
 
Graal Tutorial at CGO 2015 by Christian Wimmer
Graal Tutorial at CGO 2015 by Christian WimmerGraal Tutorial at CGO 2015 by Christian Wimmer
Graal Tutorial at CGO 2015 by Christian WimmerThomas Wuerthinger
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy
 
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure DataHow To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure DataTaro L. Saito
 
20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooksAndrey Vykhodtsev
 
EmbeddedJavaSmall.ppt
EmbeddedJavaSmall.pptEmbeddedJavaSmall.ppt
EmbeddedJavaSmall.pptMonishaAb1
 
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, UkraineVladimir Ivanov
 
Under the Hood of the Testarossa JIT Compiler
Under the Hood of the Testarossa JIT CompilerUnder the Hood of the Testarossa JIT Compiler
Under the Hood of the Testarossa JIT CompilerMark Stoodley
 

Similar to Five cool ways the JVM can run Apache Spark faster (20)

A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache Spark
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache Spark
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
 
FOSDEM 2017 - Open J9 The Next Free Java VM
FOSDEM 2017 - Open J9 The Next Free Java VMFOSDEM 2017 - Open J9 The Next Free Java VM
FOSDEM 2017 - Open J9 The Next Free Java VM
 
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsIBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark Basics
 
How Java 19 Influences the Future of Your High-Scale Applications .pdf
How Java 19 Influences the Future of Your High-Scale Applications .pdfHow Java 19 Influences the Future of Your High-Scale Applications .pdf
How Java 19 Influences the Future of Your High-Scale Applications .pdf
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
 
Graal Tutorial at CGO 2015 by Christian Wimmer
Graal Tutorial at CGO 2015 by Christian WimmerGraal Tutorial at CGO 2015 by Christian Wimmer
Graal Tutorial at CGO 2015 by Christian Wimmer
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure DataHow To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
 
20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks
 
FPGA MeetUp
FPGA MeetUpFPGA MeetUp
FPGA MeetUp
 
EmbeddedJavaSmall.ppt
EmbeddedJavaSmall.pptEmbeddedJavaSmall.ppt
EmbeddedJavaSmall.ppt
 
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
 
20160908 hivemall meetup
20160908 hivemall meetup20160908 hivemall meetup
20160908 hivemall meetup
 
Under the Hood of the Testarossa JIT Compiler
Under the Hood of the Testarossa JIT CompilerUnder the Hood of the Testarossa JIT Compiler
Under the Hood of the Testarossa JIT Compiler
 

More from Tim Ellison

The Extraordinary World of Quantum Computing
The Extraordinary World of Quantum ComputingThe Extraordinary World of Quantum Computing
The Extraordinary World of Quantum ComputingTim Ellison
 
Apache Harmony: An Open Innovation
Apache Harmony: An Open InnovationApache Harmony: An Open Innovation
Apache Harmony: An Open InnovationTim Ellison
 
Java on zSystems zOS
Java on zSystems zOSJava on zSystems zOS
Java on zSystems zOSTim Ellison
 
Inside IBM Java 7
Inside IBM Java 7Inside IBM Java 7
Inside IBM Java 7Tim Ellison
 
Real World Java Compatibility
Real World Java CompatibilityReal World Java Compatibility
Real World Java CompatibilityTim Ellison
 
Secure Engineering Practices for Java
Secure Engineering Practices for JavaSecure Engineering Practices for Java
Secure Engineering Practices for JavaTim Ellison
 
Securing Java in the Server Room
Securing Java in the Server RoomSecuring Java in the Server Room
Securing Java in the Server RoomTim Ellison
 
Modules all the way down: OSGi and the Java Platform Module System
Modules all the way down: OSGi and the Java Platform Module SystemModules all the way down: OSGi and the Java Platform Module System
Modules all the way down: OSGi and the Java Platform Module SystemTim Ellison
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaTim Ellison
 

More from Tim Ellison (9)

The Extraordinary World of Quantum Computing
The Extraordinary World of Quantum ComputingThe Extraordinary World of Quantum Computing
The Extraordinary World of Quantum Computing
 
Apache Harmony: An Open Innovation
Apache Harmony: An Open InnovationApache Harmony: An Open Innovation
Apache Harmony: An Open Innovation
 
Java on zSystems zOS
Java on zSystems zOSJava on zSystems zOS
Java on zSystems zOS
 
Inside IBM Java 7
Inside IBM Java 7Inside IBM Java 7
Inside IBM Java 7
 
Real World Java Compatibility
Real World Java CompatibilityReal World Java Compatibility
Real World Java Compatibility
 
Secure Engineering Practices for Java
Secure Engineering Practices for JavaSecure Engineering Practices for Java
Secure Engineering Practices for Java
 
Securing Java in the Server Room
Securing Java in the Server RoomSecuring Java in the Server Room
Securing Java in the Server Room
 
Modules all the way down: OSGi and the Java Platform Module System
Modules all the way down: OSGi and the Java Platform Module SystemModules all the way down: OSGi and the Java Platform Module System
Modules all the way down: OSGi and the Java Platform Module System
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with Java
 

Recently uploaded

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 

Recently uploaded (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 

Five cool ways the JVM can run Apache Spark faster

  • 1. © 2015 IBM Corporation Five cool ways the JVM can run Apache Spark faster Tim Ellison IBM Runtime Technology Center, Hursley, UK tellison @tpellison
  • 2. © 2015 IBM Corporation What's so cool about IBM's Java implementation? Reference Java Technology OpenJDK, etc IBM Java IBM Runtime Technology Centre Quality Engineering Performance Security Reliability Scalability Serviceability Production Requirements Strategic Software Hardware exploits User Feedback  A certified Java, optimized to run on wide variety of hardware platforms  Performance tuned for key software workloads  World class service and support http://www.ibm.com/developerworks/java/jdk/index.html Java Platform APIs AWT Swing Java2d XML Crypto Serializer Core Libraries Virtual Machine Intel / POWER / z Series / ARM Spark Backend Spark APIs Scala
  • 3. © 2015 IBM Corporation3 Five cool things you and I can do with the JVM to enhance the performance of Apache Spark! 1) JIT compiler enhancements and writing JIT-friendly code 2) Improving the object serializer 3) Faster IO – networking and storage 4) Data access acceleration & auto SIMD 5) Offloading tasks to graphics co-processors (GPUs)
  • 4. © 2015 IBM Corporation4 Five cool things you and I can do with the JVM to enhance the performance of Apache Spark! 1) JIT compiler enhancements and writing JIT-friendly code 2) Improving the object serializer 3) Faster IO – networking and storage 4) Data access acceleration & auto SIMD 5) Offloading tasks to graphics co-processors (GPUs) input/output bound jobs
  • 5. © 2015 IBM Corporation5 Five cool things you and I can do with the JVM to enhance the performance of Apache Spark! 1) JIT compiler enhancements and writing JIT-friendly code 2) Improving the object serializer 3) Faster IO – networking and storage 4) Data access acceleration & auto SIMD 5) Offloading tasks to graphics co-processors (GPUs) processing bound jobs
  • 6. © 2015 IBM Corporation6 JIT compiler enhancements, and writing JIT-friendly code
  • 7. © 2015 IBM Corporation7 Help yourself. JNI calls are not free! https://github.com/xerial/snappy­java/blob/develop/src/main/java/org/xerial/snappy/SnappyNative.cpp
  • 8. © 2015 IBM Corporation8 Style: Using JNI has an impact...  The cost of calling from Java code to natives and from natives to Java code is significantly higher (maybe 5x longer) than a normal Java method call. – The JIT can't in-line native methods. – The JIT can't do data flow analysis into JNI calls • e.g. it has to assume that all parameters always used. – The JIT has to set up the call stack and parameters for C calling convention, • i.e. maybe rearranging items on the stack.  JNI can introduce additional data copying costs – There's no guarantee that you will get a direct pointer to the array / string with Get<type>ArrayElements(), even when using the GetPrimitiveArrayCritical versions. – The IBM JVM will always return a copy (to allow GC to continue).  Tip: – JNI natives are more expensive than plain Java calls. – e.g. create an unsafe based Snappy-like package written in Java code so that JNI cost is eliminated.
  • 9. © 2015 IBM Corporation9 Style: Use JIT optimizations to reduce overhead of logging checks  Tip: Check for the non-null value of a static field ref to instance of a logging class singleton – e.g. – Uses the JIT's speculative optimization to avoid the explicit test for logging being enabled; instead it ... 1)Generates an internal JIT runtime assumption (e.g. InfoLogger.class is undefined), 2)NOPs the test for trace enablement 3)Uses a class initialization hook for the InfoLogger.class (already necessary for instantiating the class) 4)The JIT will regenerate the test code if the class event is fired  Spark's logging calls are gated on the checks of a static boolean value trait Logging Spark
  • 10. © 2015 IBM Corporation10 Style: Judicious use of polymorphism  Spark has a number of highly polymorphic interface call sites and high fan-in (several calling contexts invoking the same callee method) in map, reduce, filter, flatMap, ... – e.g. ExternalSorter.insertAll is very hot (drains an iterator using hasNext/next calls)  Pattern #1: – InterruptibleIterator → Scala's mapIterator → Scala's filterIterator → …  Pattern #2: – InterruptibleIterator → Scala's filterIterator → Scala's mapIterator → …  The JIT can only choose one pattern to in-line! – Makes JIT devirtualization and speculation more risky; using profiling information from a different context could lead to incorrect devirtualization. – More conservative speculation, or good phase change detection and recovery are needed in the JIT compiler to avoid getting it wrong.  Lambdas and functions as arguments, by definition, introduce different code flow targets – Passing in widely implemented interfaces produce many different bytecode sequences – When we in-line we have to put runtime checks ahead of in-lined method bodies to make sure we are going to run the right method! – Often specialized classes are used only in a very limited number of places, but the majority of the code does not use these classes and pays a heavy penalty – e.g. Scala's attempt to specialize Tuple2 Int argument does more harm than good!  Tip: Use polymorphism sparingly, use the same order / patterns for nested & wrappered code, and keep call sites homogeneous.
  • 11. © 2015 IBM Corporation11 Replacing the object serializer
  • 12. © 2015 IBM Corporation12 Writing a Spark-friendly object serializer  Expand the list of specialist serialized form – Having custom write/read object methods allows for reduced time in reflection and smaller on-wire payloads. – Types such as Tuple and Some given special treatment in the serializer  Reduce the payload size further using variable length encoding of primitive types. – All objects are eventually decomposed into primitives  Adaptive stack-based recursive serialization vs. state machine serialization – Use the stack to track state wherever possible, but fall back to state machine for deeply nested objects (e.g. big RDDs)  Sharing object representation within the serialized stream to reduce payload – But may be defeated if supportsRelocationOfSerializedObjects required  Special replacement of deserialization calls to avoid stack-walking to find class loader context – Optimization in JIT to circumvent some regular calls to more efficient versions  Tip: These are opaque to the application, no special patterns required.
  • 13. © 2015 IBM Corporation13 Faster IO – networking and storage
  • 14. © 2015 IBM Corporation Remote Direct Memory Access (RDMA) Networking Spark VM Buffer Off Heap Buffer Spark VM Buffer Off Heap Buffer Ether/IB SwitchRDMA NIC/HCA RDMA NIC/HCA OS OS DMA DMA (Z-Copy) (Z-Copy) (B-Copy)(B-Copy) Acronyms: Z-Copy – Zero Copy B-Copy – Buffer Copy IB – InfiniBand Ether - Ethernet NIC – Network Interface Card HCA – Host Control Adapter ● Low-latency, high-throughput networking ● Direct 'application to application' memory pointer exchange between remote hosts ● Off-load network processing to RDMA NIC/HCA – OS/Kernel Bypass (zero-copy) ● Introduces new IO characteristics that can influence the Spark transfer plan Spark node #1 Spark node #2
  • 15. © 2015 IBM Corporation Faster network IO with RDMA-enabled Spark 15 New dynamic transfer plan that adapts to the load and responsiveness of the remote hosts. New “RDMA” shuffle IO mode with lower latency and higher throughput. JVM-agnostic IBM JVM only JVM-agnostic IBM JVM only IBM JVM only Block manipulation (i.e., RDD partitions) High-level API JVM-agnostic working prototype with RDMA
  • 16. © 2015 IBM Corporation16 Faster storage with POWER CAPI/Flash  POWER8 architecture offers a 40Tb Flash drive attached via Coherent Accelerator Processor Interface (CAPI) – Provides simple coherent block IO APIs – No file system overhead  Power Service Layer (PSL) – Performs Address Translations – Maintains Cache – Simple, but powerful interface to the Accelerator unit  Coherent Accelerator Processor Proxy (CAPP) – Maintains directory of cache lines held by Accelerator – Snoops PowerBus on behalf of Accelerator 
  • 17. © 2015 IBM Corporation17 Faster disk IO with CAPI/Flash-enabled Spark  When under memory pressure, Spark spills RDDs to disk. – Happens in ExternalAppendOnlyMap and ExternalSorter  We have modified Spark to spill to the high-bandwidth, coherently-attached Flash device instead. – Replacement for DiskBlockManager – New FlashBlockManager handles spill to/from flash  Making this pluggable requires some further abstraction in Spark: – Spill code assumes using disks, and depends on DiskBlockManger – We are spilling without using a file system layer  Dramatically improves performance of executors under memory pressure.  Allows to reach similar performance with much less memory (denser deployments). IBM Flash System 840Power8 + CAPI
  • 18. © 2015 IBM Corporation18 Data Access Acceleration & auto SIMD
  • 19. © 2015 IBM Corporation19 Data Access Accelerators (DAA)  Provides JIT-recognized operations directly to/from byte arrays. – Think of it as org.apache.spark.unsafe.Platform APIs on steroids!  Optimized data conversion, arithmetic, etc on native format data records and types. – Avoids object creation, initialization, data copying, abstraction, boxing, etc – Range of operations on Packed Decimals; BigDecimal and BigInteger conversions from and to external decimal and Unicode decimal types; marshalling and unmarshalling primitive types (short, int, long, float, and double) to and from byte arrays – e.g. Benefits: Expose hardware acceleration in a platform and JVM-neutral manner (2 – 100x speed-up) Can provide significant speed-up to record parsing, data marshalling and inter-language communication  Tip: Don't be limited to current set of sun.misc.Unsafe APIs. Use existing DAA APIs, or write equivalent places in Spark we can recognize.
  • 20. © 2015 IBM Corporation20 Data Access Accelerator – example Problem: there is no intrinsic representation of a “packed decimal” in Java Solution: trivially add two packed decimals by converting to BigDecimal: Step 1: convert from byte[] to BigDecimal Step 2 : add BigDecimals together Step 3: convert BigDecimal result back to byte[] IBM Java provides a helper method to do this operation for you: Old Approach: byte[] addPacked(byte[] a, byte[] b) { BigDecimal a_bd = convertPackedToBd(a[]); BigDecimal b_bd = convertPackedToBd(b[]); a_bd.add(b_bd); return (convertBDtoPacked(a_bd)); } BigDecimal convertPackedToBd(byte[] myBytes) { … ~30 lines … } New Approach: byte[] addPacked(byte[] a, byte[] b) { DAA.addPacked(a[], b[]); return (a[]); } e.g. you wish to add two packed decimal numbers p.s. zSeries hardware has packed decimal machine instructions, e.g. AP (AddPacked) On IBM Java 7 Release 1, “DAA.addPacked(a,b)” is compiled to this one instruction! http://www­01.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.lnx.80.doc/user/daa.html
  • 21. © 2015 IBM Corporation Automatic Vector Processing: SIMD – Single Instruction Multiple Data The IBM JVM can operate on multiple data-elements (vectors) simultaneously  Can offer dramatic speed-up to data-parallel operations (matrix ops, string processing, etc) 21 Vector registers are 128-bits wide and can be used to operate concurrently on: • Two 64-bit floating point values • Four 32-bit floating point values • Sixteen 8-bit characters • Two 64-bit integer values • etc
  • 22. © 2015 IBM Corporation22 Offloading tasks to graphics co-processors
  • 23. © 2015 IBM Corporation23 GPU-enabled array sort method IBM Power 8 with Nvidia K40m GPU  Some Arrays.sort() methods will offload work to GPUs today – e.g. sorting large arrays of ints
  • 24. © 2015 IBM Corporation24 GPU optimization of Lambda expressions Speed-up factor when run on a GPU enabled host IBM Power 8 with Nvidia K40m GPU 0.00 0.01 0.10 1.00 10.00 100.00 1000.00 auto-SIMD parallel forEach on CPU parallel forEach on GPU matrix size The JIT can recognize parallel stream code, and automatically compile down to the GPU.
  • 25. © 2015 IBM Corporation GPU off-loading of high-level ML functions
  • 26. © 2015 IBM Corporation26 Summary  All the changes are being made in the Java runtime or being pushed out to the Spark community.  We are focused on Core performance to get a multiplier up the Spark stack. – More efficient code, more efficient memory usage/spilling, more efficient serialization & networking. – Need to take a look at bytecode codegen changes coming in via Tungsten  There are hardware and software technologies we can bring to the party. – We can tune the stack from hardware to high level structures for running Spark.  Spark and Scala developers can help themselves by their style of coding.  There is lots more stuff I don't have time to talk about, like GC optimizations, object layout, monitoring VM/Spark events, hardware compression, security, etc. etc.