SlideShare a Scribd company logo
1 of 50
Download to read offline
● Data science at Cloudera 
● Recently lead Apache Spark development at 
Cloudera 
● Before that, committing on Apache YARN 
and MapReduce 
● Hadoop project management committee
com.esotericsoftware.kryo. 
KryoException: Unable to find class: 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC 
$$iwC$$iwC$$anonfun$4$$anonfun$apply$3
org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on 
host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by 
zero 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) 
scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) 
[...] 
Driver stacktrace: 
at org.apache.spark.scheduler.DAGScheduler. 
org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages 
(DAGScheduler.scala:1033) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1. 
apply(DAGScheduler.scala:1017) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1. 
apply(DAGScheduler.scala:1015) 
[...]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on 
host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by 
zero 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) 
scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) 
[...]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on 
host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by 
zero 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) 
scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) 
[...]
val file = sc.textFile("hdfs://...") 
file.filter(_.startsWith(“banana”)) 
.count()
Job 
Stage 
Task Task 
Task Task 
Stage 
Task Task 
Task Task
val rdd1 = sc.textFile(“hdfs://...”) 
.map(someFunc) 
.filter(filterFunc) 
textFile map 
filter
val rdd2 = sc.hadoopFile(“hdfs: 
//...”) 
.groupByKey() 
.map(someOtherFunc) 
hadoopFile groupByKey map
val rdd3 = rdd1.join(rdd2) 
.map(someFunc) 
join map
rdd3.collect()
textFile map filter 
hadoop 
group 
File ByKey 
map 
join map
textFile map filter 
hadoop 
group 
File ByKey 
map 
join map
Stage 
Task Task 
Task Task
Stage 
Task Task 
Task Task
Stage 
Task Task 
Task Task
org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on 
host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by 
zero 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) 
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) 
scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) 
[...]
14/04/22 11:59:58 ERROR executor.Executor: Exception in task ID 286 6 
java.io.IOException: Filesystem closed 
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565 ) 
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:64 8) 
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706 ) 
at java.io.DataInputStream.read(DataInputStream.java:100 ) 
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:20 9) 
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173 ) 
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:20 6) 
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:4 5) 
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164 ) 
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:149 ) 
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71 ) 
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:2 7) 
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327 ) 
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388 ) 
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388 ) 
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327 ) 
at scala.collection.Iterator$class.foreach(Iterator.scala:727 ) 
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157 ) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:16 1) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:10 2) 
at org.apache.spark.scheduler.Task.run(Task.scala:53 ) 
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:21 1) 
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:4 2) 
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:4 1) 
at java.security.AccessController.doPrivileged(Native Method ) 
at javax.security.auth.Subject.doAs(Subject.java:415 ) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:140 8) 
at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:4 1) 
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176 ) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:114 5) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:61 5) 
at java.lang.Thread.run(Thread.java:724)
ResourceManager 
NodeManager NodeManager 
Container Container 
Application 
Master 
Container 
Client
ResourceManager 
Client 
NodeManager NodeManager 
Container 
Map Task 
Container 
Application 
Master 
Container 
Reduce Task
ResourceManager 
Client 
NodeManager NodeManager 
Container 
Map Task 
Container 
Application 
Master 
Container 
Reduce Task
ResourceManager 
Client 
NodeManager NodeManager 
Container 
Map Task 
Container 
Application 
Master 
Container 
Reduce Task
Container [pid=63375, 
containerID=container_1388158490598_0001_01_00 
0003] is running beyond physical memory 
limits. Current usage: 2.1 GB of 2 GB physical 
memory used; 2.8 GB of 4.2 GB virtual memory 
used. Killing container.
yarn.nodemanager.resource.memory-mb 
Executor container 
spark.yarn.executor.memoryOverhead 
spark.executor.memory 
spark.shuffle.memoryFraction 
spark.storage.memoryFraction
ExternalAppend 
OnlyMap 
Block 
Block 
deserialize 
deserialize
ExternalAppend 
OnlyMap 
key1 -> values 
key2 -> values 
key3 -> values
ExternalAppend 
OnlyMap 
key1 -> values 
key2 -> values 
key3 -> values
ExternalAppend 
OnlyMap 
Sort & Spill 
key1 -> values 
key2 -> values 
key3 -> values
rdd.reduceByKey(reduceFunc, 
numPartitions=1000)
java.io.FileNotFoundException: 
/dn6/spark/local/spark-local- 
20140610134115- 
2cee/30/merged_shuffle_0_368_14 (Too many 
open files)
Task 
Task 
Write stuff 
key1 -> values 
key2 -> values 
key3 -> values 
out 
key1 -> values 
key2 -> values 
key3 -> values Task
Task 
Task 
Write stuff 
key1 -> values 
key2 -> values 
key3 -> values 
out 
key1 -> values 
key2 -> values 
key3 -> values Task
Partition 1 
File 
Partition 2 
File 
Partition 3 
File 
Records
Records 
Buffer
Single file 
Buffer 
Sort & Spill 
Partition 1 
Records 
Partition 2 
Records 
Partition 3 
Records 
Index file
conf.set(“spark.shuffle.manager”, 
SORT)
● No 
● Distributed systems are complicated
Why your Spark job is failing

More Related Content

What's hot

Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on KubernetesDatabricks
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLDatabricks
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovMaksud Ibrahimov
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar PradhanAwesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar PradhanAjeet Singh Raina
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Chris Fregly
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Apache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceApache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceSijie Guo
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak PerformanceTodd Palino
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips confluent
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 

What's hot (20)

Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar PradhanAwesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Apache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceApache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage Service
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 

Viewers also liked

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Josef A. Habdank
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageSandeep Patil
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...gethue
 
Dynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark ApplicationDynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark ApplicationDataWorks Summit
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterDataWorks Summit
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 

Viewers also liked (16)

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better Storage
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
 
Dynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark ApplicationDynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark Application
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu Kasinathan
 
SocSciBot(01 Mar2010) - Korean Manual
SocSciBot(01 Mar2010) - Korean ManualSocSciBot(01 Mar2010) - Korean Manual
SocSciBot(01 Mar2010) - Korean Manual
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Proxy Servers
Proxy ServersProxy Servers
Proxy Servers
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Proxy Server
Proxy ServerProxy Server
Proxy Server
 

Similar to Why your Spark job is failing

Why is My Spark Job Failing? by Sandy Ryza of Cloudera
Why is My Spark Job Failing? by Sandy Ryza of ClouderaWhy is My Spark Job Failing? by Sandy Ryza of Cloudera
Why is My Spark Job Failing? by Sandy Ryza of ClouderaData Con LA
 
Why is My Spark Job Failing? by Sandy Ryza of Cloudera
Why is My Spark Job Failing? by Sandy Ryza of ClouderaWhy is My Spark Job Failing? by Sandy Ryza of Cloudera
Why is My Spark Job Failing? by Sandy Ryza of ClouderaJack Gudenkauf
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingCloudera, Inc.
 
把鐵路開進視窗裡
把鐵路開進視窗裡把鐵路開進視窗裡
把鐵路開進視窗裡Wei Jen Lu
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogVadim Semenov
 
Openshift operator insight
Openshift operator insightOpenshift operator insight
Openshift operator insightRyan ZhangCheng
 
A jar-nORM-ous Task
A jar-nORM-ous TaskA jar-nORM-ous Task
A jar-nORM-ous TaskErin Dees
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarDatabricks
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Databricks
 
Real World Lessons on the Pain Points of Node.JS Application
Real World Lessons on the Pain Points of Node.JS ApplicationReal World Lessons on the Pain Points of Node.JS Application
Real World Lessons on the Pain Points of Node.JS ApplicationBen Hall
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetAchieve Internet
 
introduction-infra-as-a-code using terraform
introduction-infra-as-a-code using terraformintroduction-infra-as-a-code using terraform
introduction-infra-as-a-code using terraformniyof97
 
Innovation and Security in Ruby on Rails
Innovation and Security in Ruby on RailsInnovation and Security in Ruby on Rails
Innovation and Security in Ruby on Railstielefeld
 
Advanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xinAdvanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xincaidezhi655
 
Scaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceScaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceBen Hall
 
Troubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support EngineerTroubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support EngineerJeff Anderson
 
Troubleshooting Tips from a Docker Support Engineer - Jeff Anderson, Docker
Troubleshooting Tips from a Docker Support Engineer - Jeff Anderson, DockerTroubleshooting Tips from a Docker Support Engineer - Jeff Anderson, Docker
Troubleshooting Tips from a Docker Support Engineer - Jeff Anderson, DockerDocker, Inc.
 
Replatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with MinecraftReplatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with MinecraftVMware Tanzu
 
Porting Rails Apps to High Availability Systems
Porting Rails Apps to High Availability SystemsPorting Rails Apps to High Availability Systems
Porting Rails Apps to High Availability SystemsMarcelo Pinheiro
 
Symfony 2 (PHP Quebec 2009)
Symfony 2 (PHP Quebec 2009)Symfony 2 (PHP Quebec 2009)
Symfony 2 (PHP Quebec 2009)Fabien Potencier
 

Similar to Why your Spark job is failing (20)

Why is My Spark Job Failing? by Sandy Ryza of Cloudera
Why is My Spark Job Failing? by Sandy Ryza of ClouderaWhy is My Spark Job Failing? by Sandy Ryza of Cloudera
Why is My Spark Job Failing? by Sandy Ryza of Cloudera
 
Why is My Spark Job Failing? by Sandy Ryza of Cloudera
Why is My Spark Job Failing? by Sandy Ryza of ClouderaWhy is My Spark Job Failing? by Sandy Ryza of Cloudera
Why is My Spark Job Failing? by Sandy Ryza of Cloudera
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
 
把鐵路開進視窗裡
把鐵路開進視窗裡把鐵路開進視窗裡
把鐵路開進視窗裡
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
 
Openshift operator insight
Openshift operator insightOpenshift operator insight
Openshift operator insight
 
A jar-nORM-ous Task
A jar-nORM-ous TaskA jar-nORM-ous Task
A jar-nORM-ous Task
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
Real World Lessons on the Pain Points of Node.JS Application
Real World Lessons on the Pain Points of Node.JS ApplicationReal World Lessons on the Pain Points of Node.JS Application
Real World Lessons on the Pain Points of Node.JS Application
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and Puppet
 
introduction-infra-as-a-code using terraform
introduction-infra-as-a-code using terraformintroduction-infra-as-a-code using terraform
introduction-infra-as-a-code using terraform
 
Innovation and Security in Ruby on Rails
Innovation and Security in Ruby on RailsInnovation and Security in Ruby on Rails
Innovation and Security in Ruby on Rails
 
Advanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xinAdvanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xin
 
Scaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceScaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container Service
 
Troubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support EngineerTroubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support Engineer
 
Troubleshooting Tips from a Docker Support Engineer - Jeff Anderson, Docker
Troubleshooting Tips from a Docker Support Engineer - Jeff Anderson, DockerTroubleshooting Tips from a Docker Support Engineer - Jeff Anderson, Docker
Troubleshooting Tips from a Docker Support Engineer - Jeff Anderson, Docker
 
Replatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with MinecraftReplatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with Minecraft
 
Porting Rails Apps to High Availability Systems
Porting Rails Apps to High Availability SystemsPorting Rails Apps to High Availability Systems
Porting Rails Apps to High Availability Systems
 
Symfony 2 (PHP Quebec 2009)
Symfony 2 (PHP Quebec 2009)Symfony 2 (PHP Quebec 2009)
Symfony 2 (PHP Quebec 2009)
 

Recently uploaded

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Recently uploaded (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Why your Spark job is failing

  • 1.
  • 2. ● Data science at Cloudera ● Recently lead Apache Spark development at Cloudera ● Before that, committing on Apache YARN and MapReduce ● Hadoop project management committee
  • 3. com.esotericsoftware.kryo. KryoException: Unable to find class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC $$iwC$$iwC$$anonfun$4$$anonfun$apply$3
  • 4.
  • 5.
  • 6.
  • 7. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by zero $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler. org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages (DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1. apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1. apply(DAGScheduler.scala:1015) [...]
  • 8. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by zero $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) [...]
  • 9. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by zero $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) [...]
  • 10. val file = sc.textFile("hdfs://...") file.filter(_.startsWith(“banana”)) .count()
  • 11. Job Stage Task Task Task Task Stage Task Task Task Task
  • 12.
  • 13. val rdd1 = sc.textFile(“hdfs://...”) .map(someFunc) .filter(filterFunc) textFile map filter
  • 14. val rdd2 = sc.hadoopFile(“hdfs: //...”) .groupByKey() .map(someOtherFunc) hadoopFile groupByKey map
  • 15. val rdd3 = rdd1.join(rdd2) .map(someFunc) join map
  • 17. textFile map filter hadoop group File ByKey map join map
  • 18. textFile map filter hadoop group File ByKey map join map
  • 19. Stage Task Task Task Task
  • 20. Stage Task Task Task Task
  • 21. Stage Task Task Task Task
  • 22. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by zero $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) [...]
  • 23. 14/04/22 11:59:58 ERROR executor.Executor: Exception in task ID 286 6 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565 ) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:64 8) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706 ) at java.io.DataInputStream.read(DataInputStream.java:100 ) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:20 9) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173 ) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:20 6) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:4 5) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164 ) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:149 ) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71 ) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:2 7) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327 ) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388 ) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388 ) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327 ) at scala.collection.Iterator$class.foreach(Iterator.scala:727 ) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157 ) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:16 1) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:10 2) at org.apache.spark.scheduler.Task.run(Task.scala:53 ) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:21 1) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:4 2) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:4 1) at java.security.AccessController.doPrivileged(Native Method ) at javax.security.auth.Subject.doAs(Subject.java:415 ) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:140 8) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:4 1) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176 ) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:114 5) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:61 5) at java.lang.Thread.run(Thread.java:724)
  • 24.
  • 25. ResourceManager NodeManager NodeManager Container Container Application Master Container Client
  • 26. ResourceManager Client NodeManager NodeManager Container Map Task Container Application Master Container Reduce Task
  • 27. ResourceManager Client NodeManager NodeManager Container Map Task Container Application Master Container Reduce Task
  • 28. ResourceManager Client NodeManager NodeManager Container Map Task Container Application Master Container Reduce Task
  • 29. Container [pid=63375, containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container.
  • 30. yarn.nodemanager.resource.memory-mb Executor container spark.yarn.executor.memoryOverhead spark.executor.memory spark.shuffle.memoryFraction spark.storage.memoryFraction
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. ExternalAppend OnlyMap Block Block deserialize deserialize
  • 37. ExternalAppend OnlyMap key1 -> values key2 -> values key3 -> values
  • 38. ExternalAppend OnlyMap key1 -> values key2 -> values key3 -> values
  • 39. ExternalAppend OnlyMap Sort & Spill key1 -> values key2 -> values key3 -> values
  • 41.
  • 42. java.io.FileNotFoundException: /dn6/spark/local/spark-local- 20140610134115- 2cee/30/merged_shuffle_0_368_14 (Too many open files)
  • 43. Task Task Write stuff key1 -> values key2 -> values key3 -> values out key1 -> values key2 -> values key3 -> values Task
  • 44. Task Task Write stuff key1 -> values key2 -> values key3 -> values out key1 -> values key2 -> values key3 -> values Task
  • 45. Partition 1 File Partition 2 File Partition 3 File Records
  • 47. Single file Buffer Sort & Spill Partition 1 Records Partition 2 Records Partition 3 Records Index file
  • 49. ● No ● Distributed systems are complicated