SlideShare a Scribd company logo
1 of 47
1© Cloudera, Inc. All rights reserved.
Effective Spark on Multi-Tenant
Clusters
Kostas Sakellis
2© Cloudera, Inc. All rights reserved.
Me
• Spark Tech Lead Manager at Cloudera
• Contributed to Apache Spark
• Previously, stint on Cloudera Manager
3© Cloudera, Inc. All rights reserved.
Challenges
• Predictable execution time of Spark jobs
• Prevent Starvation
• Optimal cluster utilization
• Secure Data access
• Configuration Management
4© Cloudera, Inc. All rights reserved.
Spark on YARN
5© Cloudera, Inc. All rights reserved.
Why YARN?
• Spark supports pluggable Cluster Managers
• local, Standalone, YARN and Mesos
• YARN contains proper resource manager
• Enables multi-platform jobs
• Spark on YARN is mature with active community
6© Cloudera, Inc. All rights reserved.
Running an application
spark-submit --master yarn-cluster
--executor-memory 2g
--num-executors 3
--num-cores 2
<your-class>
7© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
System Architecture
host-a.mydomain.com
Resource Manager
Node Manager
Host-c.mydomain.com
Node Manager
Node Manager
Container
App Master
Exec2
Exec1
Exec3
Driver
Driver
Exec1 Exec2
8© Cloudera, Inc. All rights reserved.
Gotchas
• Ensure compatible YARN configuration
• yarn.nodemanager.resource.[memory-mb|cpu-vcores]
• yarn.scheduler.maximum-allocation-[vcores|mb]
• ...
• Remember overhead memory
• spark.yarn.executor.memoryOverhead
• Default of 10% since Spark 1.4
9© Cloudera, Inc. All rights reserved.
Container
[pid=63375,containerID=container_1388158490598_0001_01_00
0003] is running beyond physical memory limits. Current
usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2
GB virtual memory used. Killing container.
[...]
Otherwise…
10© Cloudera, Inc. All rights reserved.
Container
[pid=63375,containerID=container_1388158490598_0001_01_00
0003] is running beyond physical memory limits. Current
usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2
GB virtual memory used. Killing container.
[...]
Otherwise…
11© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
System Architecture
host-a.mydomain.com
Resource Manager
Node Manager
Host-c.mydomain.com
Node Manager
Node Manager
Exec2
Exec1
Exec3
Driver
Driver
Exec1 Exec2
Exec3
Exec2
Exec1
Driver
12© Cloudera, Inc. All rights reserved.
How do we share
a common
resource?
Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpg
13© Cloudera, Inc. All rights reserved.
Resource Management
• YARN has ability to create resource queues
• Priorities can be set per queues
• Preemption is also available
• Fixed in Spark 1.6 (SPARK-8167)
• yarn.scheduler.fair.preemption
14© Cloudera, Inc. All rights reserved.
Running an application
spark-submit --master yarn-cluster
--queue my-special-queue
--executor-memory 2g
--num-executors 3
--num-cores 2
<your-class>
15© Cloudera, Inc. All rights reserved.
How about
locality?
Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpgCourtesy of: https://blog.voxbone.com/wp-content/uploads/2015/07/think-global-act-local.jpg
16© Cloudera, Inc. All rights reserved.
ExecutorExecutor
Task Scheduling
Driver Executor
DAG Scheduler
Task Scheduler
Core
TaskTask
Shuffle
Shuffle
stagestageStage
Spark Context JobJobJob
17© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
Locality
host-a.mydomain.com
Resource Manager
Node Manager
HDFS
x:B1 x:B2 y:B1 y:B3
Host-c.mydomain.com
Node Manager
Node Manager
HDFS
x:B3 x:B2 y:B2 y:B3
HDFS
x:B3 x:B1 y:B1 y:B2
hdfs://x
hdfs://y
Exec2
Exec1Driver
18© Cloudera, Inc. All rights reserved.
Spark creates executors before
executing code!
19© Cloudera, Inc. All rights reserved.
Underutilized
Clusters
Courtesy of: http://media.nbclosangeles.com/images/1200*675/60-freeway-repair-dec16-2-empty.JPG
20© Cloudera, Inc. All rights reserved.
Dynamic Allocation
• Spark applications scale the number of executors based on load
• Removes need for: --num-executors
• Idle executors get killed
• First supported in CDH 5.4
• Ideal for:
• Long ETL jobs with large shuffles
• shell applications: hive and spark shell
21© Cloudera, Inc. All rights reserved.
Task Scheduling
Driver
DAG Scheduler
Task Scheduler
stagestageStage
Spark Context JobJobJob
host-a.mydomain.com
Node Manager
Exec1
host-b.mydomain.com
Node Manager
Exec2
host-c.mydomain.com
Node Manager
Task
Task
Exec3
Task
Task
RM
22© Cloudera, Inc. All rights reserved.
Dynamic Allocation Configuration
• Many Knobs
• spark.dynamicAllocation.enabled
• spark.dynamicAllocation.[min|max|initial]Executors
• spark.dynamicAllocation.executorIdleTimeout
• spark.dynamicAllocation.cachedExecutorIdleTimeout
• ...
• --num-executors will disable dynamic allocation
23© Cloudera, Inc. All rights reserved.
Dynamic Allocation Limitations
• Still required to specify cores
• --num-cores
• Memory
• --executor-memory
• Includes JVM overhead
• Caching
• spark.dynamicAllocation.cachedExecutorIdleTimeout
24© Cloudera, Inc. All rights reserved.
The Future of Dynamic Allocation
• Only “task size” needed: --task-size
• Eliminates
• --num-cores
• --num-executors
• --executor-memory
• Leads to better cluster utilization
25© Cloudera, Inc. All rights reserved.
Dynamic Allocation respects
Locality!
26© Cloudera, Inc. All rights reserved.
Security, oh no!
Courtesy of: https://www.iti.illinois.edu/sites/default/files/Cybersecurity_image.jpg
27© Cloudera, Inc. All rights reserved.
Security
• Shared resources -> Shared data
• Security has many facets
• Encryption
• Authentication
• Authorization
• Encryption is interesting for multi-tenant clusters
28© Cloudera, Inc. All rights reserved.
Encryption
Who’s looking at the data?
29© Cloudera, Inc. All rights reserved.
Data Flow in Spark
Driver
Executor
Executor
Spark
Submit
Control Plane
File Distribution
Shuffle Blocks
UI
Disk
Disk
Spilled/Shuffle Blocks
30© Cloudera, Inc. All rights reserved.
Prior to Spark 1.6
• Different channel, different method
• Control plane
• File distribution
• Shuffle Blocks
• User UI / REST API
• Spilled/Shuffle Blocks
SSL
SSL
SASL Encryption
No Encryption
Use encrypfs (or equivalent)
31© Cloudera, Inc. All rights reserved.
What is wrong with SSL?
32© Cloudera, Inc. All rights reserved.
Why not SSL?
• SSL can be hard to set up
• Need certificates readable on every node
• Sharing certificates not as secure
• Hard to have per-user certificate
33© Cloudera, Inc. All rights reserved.
Spark 1.6
• Standardize around a common transport library
• Replaces Akka RPC (SPARK-6028)
• Replaces HTTP File service (SPARK-11140)
• Uses Netty transport library with SASL Encryption
• But..
• WebUI still has no encryption
• Shuffle / Spilled blocks still require FS-level encryption
• SASL in JVM restricted to 3DES – not very strong and slow
34© Cloudera, Inc. All rights reserved.
Spark 2.0
• REPL class distribution using transport lib (SPARK-11563)
• HTTPS Support for WebUI (SPARK-2750)
• Encrypting spilled blocks is almost available (SPARK-5682)
• Depends on third party Chimera library for encryption
• Work is being done to add Chimera to Apache Commons
• Future:
• Use Chimera to encrypt over-the-wire data
35© Cloudera, Inc. All rights reserved.
Gateways:
launching Spark
Application
Courtesy of:
36© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
Spark Gateway
Resource Manager
Host-c.mydomain.com
Node Manager
Node Manager
gateway-a.mydomain.com
Bob Client
Client
Configs
Spark
Install
Random
Ports
Driver
Exec1 Exec2
Exec1 Driver
SSH
37© Cloudera, Inc. All rights reserved.
Gateway Considerations
• Gateway hosts actively managed by administrators
• Updates to client configurations and Spark installs
• Users need to tunnel into network
• Difficult to put users behind firewall
• YARN allows different Spark versions
• spark.yarn.jar or spark.yarn.archive
• Shared Spark services makes this difficult
38© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
Shared Services
Resource Manager
Host-c.mydomain.com
Node Manager
Node Manager
gateway-a.mydomain.com
Bob Client
Client
Configs
Spark
Install
Random
Ports
Driver
Exec1 Exec2
Exec1 Driver
SSH
S
S
S
S
History
Service
39© Cloudera, Inc. All rights reserved.
Alternative
An open source Apache licensed REST web service that manages
long running Spark contexts in your cluster
40© Cloudera, Inc. All rights reserved.
Livy Architecture
Rest
Server
Cluster Manager
Driver ExecutorExecutor
Client
Driver ExecutorExecutor
The Managed ClusterHTTP
Context 1
Context 2
Context 2
Context 1
41© Cloudera, Inc. All rights reserved.
Case 1: Spark Application JAR Submission
• Enables spark applications to be submitted without needing a
Spark installation
• Basically a wrapper around spark-submit
% curl –XPOST localhost:8998/batches -d
'{
"file": "<path_to_file>",
“className”: “com.foo.bar..”
...
}'
42© Cloudera, Inc. All rights reserved.
How do you retrieve results?
43© Cloudera, Inc. All rights reserved.
Case 2: Fine grained Job submission
• Programmatic submission of Spark jobs to a long running
application
• A thin Java (and Scala) client available for easier integration
• Provides automatic serialization/deserialization
• Enables Web/Mobile applications to use Spark as a backend
44© Cloudera, Inc. All rights reserved.
Case 2: Example
// Create Livy Client
LivyClient client = new LivyClientBuilder(false)
.setURI(new URI(”<uri>"))
.setAll(<config>)
.build()
// JobHandle allows monitoring of jobs
JobHandle<Long> handle = client.submit(new YourJob());
// Block until results are returned
handle.get(TIMEOUT, TimeUnit.SECONDS)
// Close connections
client.stop()
45© Cloudera, Inc. All rights reserved.
Case 2: Example
private static class YourJob implements Job<Long> {
@Override
public Long call(JobContext jc) {
ArrayList<Long> list = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = jc.sc().parallelize(list);
return rdd.count();
}
}
// Job Interface to Implement
public interface Job<T> extends Serializable {
T call(JobContext jc) throws Exception;
}
46© Cloudera, Inc. All rights reserved.
Contributions Welcome!
• http://livy.io/
• Code: https://github.com/cloudera/livy
• JIRA: https://issues.cloudera.org/browse/LIVY
• Users: http://groups.google.com/a/cloudera.org/group/livy-user
• Dev: http://groups.google.com/a/cloudera.org/group/livy-dev
47© Cloudera, Inc. All rights reserved.
Thank you

More Related Content

What's hot

Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep diveDataWorks Summit
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Databricks
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 

What's hot (20)

Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 

Similar to Effective Spark on Multi-Tenant Clusters

Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
 
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkJeremy Beard
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionCloudera, Inc.
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2DataWorks Summit
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaGrant Henke
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in productionBreaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in productionNeelesh Srinivas Salian
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
 
OpenStack for devops environment
OpenStack for devops environment OpenStack for devops environment
OpenStack for devops environment Orgad Kimchi
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 

Similar to Effective Spark on Multi-Tenant Clusters (20)

Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to Production
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in productionBreaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
OpenStack for devops environment
OpenStack for devops environment OpenStack for devops environment
OpenStack for devops environment
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Effective Spark on Multi-Tenant Clusters

  • 1. 1© Cloudera, Inc. All rights reserved. Effective Spark on Multi-Tenant Clusters Kostas Sakellis
  • 2. 2© Cloudera, Inc. All rights reserved. Me • Spark Tech Lead Manager at Cloudera • Contributed to Apache Spark • Previously, stint on Cloudera Manager
  • 3. 3© Cloudera, Inc. All rights reserved. Challenges • Predictable execution time of Spark jobs • Prevent Starvation • Optimal cluster utilization • Secure Data access • Configuration Management
  • 4. 4© Cloudera, Inc. All rights reserved. Spark on YARN
  • 5. 5© Cloudera, Inc. All rights reserved. Why YARN? • Spark supports pluggable Cluster Managers • local, Standalone, YARN and Mesos • YARN contains proper resource manager • Enables multi-platform jobs • Spark on YARN is mature with active community
  • 6. 6© Cloudera, Inc. All rights reserved. Running an application spark-submit --master yarn-cluster --executor-memory 2g --num-executors 3 --num-cores 2 <your-class>
  • 7. 7© Cloudera, Inc. All rights reserved. Host-b.mydomain.com System Architecture host-a.mydomain.com Resource Manager Node Manager Host-c.mydomain.com Node Manager Node Manager Container App Master Exec2 Exec1 Exec3 Driver Driver Exec1 Exec2
  • 8. 8© Cloudera, Inc. All rights reserved. Gotchas • Ensure compatible YARN configuration • yarn.nodemanager.resource.[memory-mb|cpu-vcores] • yarn.scheduler.maximum-allocation-[vcores|mb] • ... • Remember overhead memory • spark.yarn.executor.memoryOverhead • Default of 10% since Spark 1.4
  • 9. 9© Cloudera, Inc. All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...] Otherwise…
  • 10. 10© Cloudera, Inc. All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...] Otherwise…
  • 11. 11© Cloudera, Inc. All rights reserved. Host-b.mydomain.com System Architecture host-a.mydomain.com Resource Manager Node Manager Host-c.mydomain.com Node Manager Node Manager Exec2 Exec1 Exec3 Driver Driver Exec1 Exec2 Exec3 Exec2 Exec1 Driver
  • 12. 12© Cloudera, Inc. All rights reserved. How do we share a common resource? Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpg
  • 13. 13© Cloudera, Inc. All rights reserved. Resource Management • YARN has ability to create resource queues • Priorities can be set per queues • Preemption is also available • Fixed in Spark 1.6 (SPARK-8167) • yarn.scheduler.fair.preemption
  • 14. 14© Cloudera, Inc. All rights reserved. Running an application spark-submit --master yarn-cluster --queue my-special-queue --executor-memory 2g --num-executors 3 --num-cores 2 <your-class>
  • 15. 15© Cloudera, Inc. All rights reserved. How about locality? Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpgCourtesy of: https://blog.voxbone.com/wp-content/uploads/2015/07/think-global-act-local.jpg
  • 16. 16© Cloudera, Inc. All rights reserved. ExecutorExecutor Task Scheduling Driver Executor DAG Scheduler Task Scheduler Core TaskTask Shuffle Shuffle stagestageStage Spark Context JobJobJob
  • 17. 17© Cloudera, Inc. All rights reserved. Host-b.mydomain.com Locality host-a.mydomain.com Resource Manager Node Manager HDFS x:B1 x:B2 y:B1 y:B3 Host-c.mydomain.com Node Manager Node Manager HDFS x:B3 x:B2 y:B2 y:B3 HDFS x:B3 x:B1 y:B1 y:B2 hdfs://x hdfs://y Exec2 Exec1Driver
  • 18. 18© Cloudera, Inc. All rights reserved. Spark creates executors before executing code!
  • 19. 19© Cloudera, Inc. All rights reserved. Underutilized Clusters Courtesy of: http://media.nbclosangeles.com/images/1200*675/60-freeway-repair-dec16-2-empty.JPG
  • 20. 20© Cloudera, Inc. All rights reserved. Dynamic Allocation • Spark applications scale the number of executors based on load • Removes need for: --num-executors • Idle executors get killed • First supported in CDH 5.4 • Ideal for: • Long ETL jobs with large shuffles • shell applications: hive and spark shell
  • 21. 21© Cloudera, Inc. All rights reserved. Task Scheduling Driver DAG Scheduler Task Scheduler stagestageStage Spark Context JobJobJob host-a.mydomain.com Node Manager Exec1 host-b.mydomain.com Node Manager Exec2 host-c.mydomain.com Node Manager Task Task Exec3 Task Task RM
  • 22. 22© Cloudera, Inc. All rights reserved. Dynamic Allocation Configuration • Many Knobs • spark.dynamicAllocation.enabled • spark.dynamicAllocation.[min|max|initial]Executors • spark.dynamicAllocation.executorIdleTimeout • spark.dynamicAllocation.cachedExecutorIdleTimeout • ... • --num-executors will disable dynamic allocation
  • 23. 23© Cloudera, Inc. All rights reserved. Dynamic Allocation Limitations • Still required to specify cores • --num-cores • Memory • --executor-memory • Includes JVM overhead • Caching • spark.dynamicAllocation.cachedExecutorIdleTimeout
  • 24. 24© Cloudera, Inc. All rights reserved. The Future of Dynamic Allocation • Only “task size” needed: --task-size • Eliminates • --num-cores • --num-executors • --executor-memory • Leads to better cluster utilization
  • 25. 25© Cloudera, Inc. All rights reserved. Dynamic Allocation respects Locality!
  • 26. 26© Cloudera, Inc. All rights reserved. Security, oh no! Courtesy of: https://www.iti.illinois.edu/sites/default/files/Cybersecurity_image.jpg
  • 27. 27© Cloudera, Inc. All rights reserved. Security • Shared resources -> Shared data • Security has many facets • Encryption • Authentication • Authorization • Encryption is interesting for multi-tenant clusters
  • 28. 28© Cloudera, Inc. All rights reserved. Encryption Who’s looking at the data?
  • 29. 29© Cloudera, Inc. All rights reserved. Data Flow in Spark Driver Executor Executor Spark Submit Control Plane File Distribution Shuffle Blocks UI Disk Disk Spilled/Shuffle Blocks
  • 30. 30© Cloudera, Inc. All rights reserved. Prior to Spark 1.6 • Different channel, different method • Control plane • File distribution • Shuffle Blocks • User UI / REST API • Spilled/Shuffle Blocks SSL SSL SASL Encryption No Encryption Use encrypfs (or equivalent)
  • 31. 31© Cloudera, Inc. All rights reserved. What is wrong with SSL?
  • 32. 32© Cloudera, Inc. All rights reserved. Why not SSL? • SSL can be hard to set up • Need certificates readable on every node • Sharing certificates not as secure • Hard to have per-user certificate
  • 33. 33© Cloudera, Inc. All rights reserved. Spark 1.6 • Standardize around a common transport library • Replaces Akka RPC (SPARK-6028) • Replaces HTTP File service (SPARK-11140) • Uses Netty transport library with SASL Encryption • But.. • WebUI still has no encryption • Shuffle / Spilled blocks still require FS-level encryption • SASL in JVM restricted to 3DES – not very strong and slow
  • 34. 34© Cloudera, Inc. All rights reserved. Spark 2.0 • REPL class distribution using transport lib (SPARK-11563) • HTTPS Support for WebUI (SPARK-2750) • Encrypting spilled blocks is almost available (SPARK-5682) • Depends on third party Chimera library for encryption • Work is being done to add Chimera to Apache Commons • Future: • Use Chimera to encrypt over-the-wire data
  • 35. 35© Cloudera, Inc. All rights reserved. Gateways: launching Spark Application Courtesy of:
  • 36. 36© Cloudera, Inc. All rights reserved. Host-b.mydomain.com Spark Gateway Resource Manager Host-c.mydomain.com Node Manager Node Manager gateway-a.mydomain.com Bob Client Client Configs Spark Install Random Ports Driver Exec1 Exec2 Exec1 Driver SSH
  • 37. 37© Cloudera, Inc. All rights reserved. Gateway Considerations • Gateway hosts actively managed by administrators • Updates to client configurations and Spark installs • Users need to tunnel into network • Difficult to put users behind firewall • YARN allows different Spark versions • spark.yarn.jar or spark.yarn.archive • Shared Spark services makes this difficult
  • 38. 38© Cloudera, Inc. All rights reserved. Host-b.mydomain.com Shared Services Resource Manager Host-c.mydomain.com Node Manager Node Manager gateway-a.mydomain.com Bob Client Client Configs Spark Install Random Ports Driver Exec1 Exec2 Exec1 Driver SSH S S S S History Service
  • 39. 39© Cloudera, Inc. All rights reserved. Alternative An open source Apache licensed REST web service that manages long running Spark contexts in your cluster
  • 40. 40© Cloudera, Inc. All rights reserved. Livy Architecture Rest Server Cluster Manager Driver ExecutorExecutor Client Driver ExecutorExecutor The Managed ClusterHTTP Context 1 Context 2 Context 2 Context 1
  • 41. 41© Cloudera, Inc. All rights reserved. Case 1: Spark Application JAR Submission • Enables spark applications to be submitted without needing a Spark installation • Basically a wrapper around spark-submit % curl –XPOST localhost:8998/batches -d '{ "file": "<path_to_file>", “className”: “com.foo.bar..” ... }'
  • 42. 42© Cloudera, Inc. All rights reserved. How do you retrieve results?
  • 43. 43© Cloudera, Inc. All rights reserved. Case 2: Fine grained Job submission • Programmatic submission of Spark jobs to a long running application • A thin Java (and Scala) client available for easier integration • Provides automatic serialization/deserialization • Enables Web/Mobile applications to use Spark as a backend
  • 44. 44© Cloudera, Inc. All rights reserved. Case 2: Example // Create Livy Client LivyClient client = new LivyClientBuilder(false) .setURI(new URI(”<uri>")) .setAll(<config>) .build() // JobHandle allows monitoring of jobs JobHandle<Long> handle = client.submit(new YourJob()); // Block until results are returned handle.get(TIMEOUT, TimeUnit.SECONDS) // Close connections client.stop()
  • 45. 45© Cloudera, Inc. All rights reserved. Case 2: Example private static class YourJob implements Job<Long> { @Override public Long call(JobContext jc) { ArrayList<Long> list = Arrays.asList(1, 2, 3, 4, 5); JavaRDD<Integer> rdd = jc.sc().parallelize(list); return rdd.count(); } } // Job Interface to Implement public interface Job<T> extends Serializable { T call(JobContext jc) throws Exception; }
  • 46. 46© Cloudera, Inc. All rights reserved. Contributions Welcome! • http://livy.io/ • Code: https://github.com/cloudera/livy • JIRA: https://issues.cloudera.org/browse/LIVY • Users: http://groups.google.com/a/cloudera.org/group/livy-user • Dev: http://groups.google.com/a/cloudera.org/group/livy-dev
  • 47. 47© Cloudera, Inc. All rights reserved. Thank you

Editor's Notes

  1. This shows up in the YARN NodeManager logs
  2. Allow multiple groups to access shared resources while ensuring some dedicated share of the resource
  3. Allow multiple groups to access shared resources while ensuring some dedicated share of the resource
  4. Spark makes building a proof of concept with a subset of data relatively easy.
  5. Every connection in the previous slide can transmit sensitive data! Input data transmitted via broadcast variables Computed data during shuffles Data in serialized tasks, files uploaded with the job How to prevent other users from seeing this data?
  6. Spark makes building a proof of concept with a subset of data relatively easy.