SlideShare a Scribd company logo
1 of 34
Download to read offline
Date
Spark Job Server
Evan Chan and Kelvin Chu
Overview
• REST API for Spark jobs and contexts. Easily operate Spark from any
language or environment.
• Runs jobs in their own Contexts or share 1 context amongst jobs
• Great for sharing cached RDDs across jobs and low-latency jobs
• Works with Standalone, Mesos, any Spark config
• Jars, job history and config are persisted via a pluggable API
• Async and sync API, JSON job results
http://github.com/ooyala/spark-jobserver
Open Source!!
History
CONFIDENTIAL—DO NOT DISTRIBUTE 5
Founded in 2007

Commercially launched in 2009

300+ employees in Silicon Valley, LA, NYC, 

London, Paris, Tokyo, Sydney & Guadalajara 

Global footprint, 200M unique users,
110+
countries, and more than 6,000 websites

Over 1 billion videos played per month 
and 2
billion analytic events per day

25% of U.S. online viewers watch video 

powered by Ooyala
Ooyala, Inc.
Spark at Ooyala
• Started investing in Spark beginning of 2013

• Developers loved it, promise of a unifying platform

• 2 teams of developers building on Spark

• Actively contributing to the Spark community

• Largest Spark cluster has > 100 nodes

• Spark community very active, huge amount of interest
From raw logs to fast queries
Processing
C*

columnar
store
Raw Log
Files
Raw Log
Files
Raw Log
Files Spark
Spark
Spark
View 1
View 2
View 3
Spark
Shark
Predefined
queries
Ad-hoc
HiveQL
Our Spark/Shark/Cassandra Stack
Node1
Cassandra
SerDe
Spark
Worker
Shark
Node2
Cassandra
SerDe
Spark
Worker
Shark
Node3
Cassandra
SerDe
Spark
Worker
Shark
Spark Master Job Server
WhyWe Needed a Job Server
• Our vision for Spark is as a multi-team big data service
• What gets repeated by every team:
• Bastion box for running Hadoop/Spark jobs
• Deploys and process monitoring
• Tracking and serializing job status, progress, and job results
• Job validation
• No easy way to kill jobs
• Polyglot technology stack - Ruby scripts run jobs, Go services
ExampleWorkflow
Creating a Job Server Project
✤ sbt assembly -> fat jar -> upload to job server!
✤ "provided" is used. Don’t want SBT assembly to include the
whole job server jar.!
✤ Java projects should be possible too
resolvers += "Ooyala Bintray" at "http://dl.bintray.com/ooyala/maven"
!
libraryDependencies += "ooyala.cnd" % "job-server" % "0.3.1" % "provided"
✤ In your build.sbt, add this
Example Job Server Job
/**!
* A super-simple Spark job example that implements the SparkJob trait and!
* can be submitted to the job server.!
*/!
object WordCountExample extends SparkJob {!
override def validate(sc: SparkContext, config: Config): SparkJobValidation = {!
Try(config.getString(“input.string”))!
.map(x => SparkJobValid)!
.getOrElse(SparkJobInvalid(“No input.string”))!
}!
!
override def runJob(sc: SparkContext, config: Config): Any = {!
val dd = sc.parallelize(config.getString(“input.string”).split(" ").toSeq)!
dd.map((_, 1)).reduceByKey(_ + _).collect().toMap!
}!
}!
What’s Different?
• Job does not create Context, Job Server does
• Decide when I run the job: in own context, or in pre-created context
• Upload new jobs to diagnose your RDD issues:
• POST /contexts/newContext
• POST /jobs .... context=newContext
• Upload a new diagnostic jar... POST /jars/newDiag
• Run diagnostic jar to dump into on cached RDDs
Submitting and Running a Job
✦ curl --data-binary @../target/mydemo.jar localhost:8090/jars/demo
OK[11:32 PM] ~
!
✦ curl -d "input.string = A lazy dog jumped mean dog" 'localhost:8090/jobs?
appName=demo&classPath=WordCountExample&sync=true'
{
"status": "OK",
"RESULT": {
"lazy": 1,
"jumped": 1,
"A": 1,
"mean": 1,
"dog": 2
}
}
Retrieve Job Statuses
~/s/jobserver (evan-working-1 ↩=) curl 'localhost:8090/jobs?limit=2'
[{
"duration": "77.744 secs",
"classPath": "ooyala.cnd.CreateMaterializedView",
"startTime": "2013-11-26T20:13:09.071Z",
"context": "8b7059dd-ooyala.cnd.CreateMaterializedView",
"status": "FINISHED",
"jobId": "9982f961-aaaa-4195-88c2-962eae9b08d9"
}, {
"duration": "58.067 secs",
"classPath": "ooyala.cnd.CreateMaterializedView",
"startTime": "2013-11-26T20:22:03.257Z",
"context": "d0a5ebdc-ooyala.cnd.CreateMaterializedView",
"status": "FINISHED",
"jobId": "e9317383-6a67-41c4-8291-9c140b6d8459"
}]
Use Case: Fast Query Jobs
Spark as a Query Engine
✤ Goal: spark jobs that run in under a second and answers queries
on shared RDD data!
✤ Query params passed in as job config!
✤ Need to minimize context creation overhead!
✤ Thus many jobs sharing the same SparkContext!
✤ On-heap RDD caching means no serialization loss!
✤ Need to consider concurrent jobs (fair scheduling)
LOW-LATENCY QUERY JOBS
RDDLoad Data
Query
Job
Spark

Executors
Cassandra
REST Job Server
Query
Job
Query
Result
Query
Result
new SparkContext
Create
query
context
Load
some
data
Sharing Data Between Jobs
✤ RDD Caching!
✤ Benefit: no need to serialize data. Especially useful for indexes etc.!
✤ Job server provides a NamedRdds trait for thread-safe CRUD of
cached RDDs by name!
✤ (Compare to SparkContext’s API which uses an integer ID and
is not thread safe)!
✤ For example, at Ooyala a number of fields are multiplexed into the
RDD name: timestamp:customerID:granularity
Data Concurrency
✤ Single writer, multiple readers!
✤ Managing multiple updates to RDDs!
✤ Cache keeps track of which RDDs being updated!
✤ Example: thread A spark job creates RDD “A” at t0!
✤ thread B fetches RDD “A” at t1 > t0!
✤ Both threads A and B, using NamedRdds, will get the RDD at
time t2 when thread A finishes creating the RDD “A”
UsingTachyon
Pros Cons
Off-heap storage: No GC
ByteBuffer API - need to
pay deserialization cost
Can be shared across
multiple processes
Data can survive process
loss
Backed by HDFS
Does not support random
access writes
Architecture
Completely Async Design
✤ http://spray.io - probably the fastest JVM HTTP
microframework!
✤ Akka Actor based, non blocking!
✤ Futures used to manage individual jobs. (Note that
Spark is using Scala futures to manage job stages now)!
✤ Single JVM for now, but easy to distribute later via
remote Actors / Akka Cluster
Async Actor Flow
Spray web
API
Request
actor
Local
Supervisor
Job
Manager
Job 1
Future
Job 2
Future
Job Status
Actor
Job Result
Actor
Message flow fully documented
Production Usage
Metadata Store
✤ JarInfo, JobInfo, ConfigInfo!
✤ JobSqlDAO. Store metadata to SQL database by JDBC interface.!
✤ Easily configured by spark.sqldao.jdbc.url!
✤ jdbc:mysql://dbserver:3306/jobserverdb
✤ Multiple Job Servers can share the same MySQL.!
✤ Jars uploaded once but accessible by all servers.!
✤ The default will be JobSqlDAO and H2.!
✤ Single H2 DB file. Serialization and deserialization are handled by H2.
Deployment and Metrics
✤ spark-jobserver repo comes with a full suite of tests
and deploy scripts:!
✤ server_deploy.sh for regular server pushes!
✤ server_package.sh for Mesos and Chronos .tar.gz!
✤ /metricz route for codahale-metrics monitoring!
✤ /healthz route for health check0o
Challenges and Lessons
• Spark is based around contexts - we need a Job Server oriented around
logical jobs
• Running multiple SparkContexts in the same process
• Global use of System properties makes it impossible to start multiple
contexts at same time (but see pull request...)
• Have to be careful with SparkEnv
• Dynamic jar and class loading is tricky
• Manage threads carefully - each context uses lots of threads
FutureWork
Future Plans
✤ Spark-contrib project list. So this and other projects
can gain visibility! (SPARK-1283)!
✤ HA mode using Akka Cluster or Mesos!
✤ HA and Hot Failover for Spark Drivers/Contexts!
✤ REST API for job progress!
✤ Swagger API documentation
HA and Hot Failover for Jobs
Job
Server 1
Job
Server 2
Active
Job
Context
HDFS
Standby
Job
Context
Gossip
Checkpoint
✤ Job context dies:!
✤ Job server 2
notices and spins
up standby
context, restores
checkpoint
Thanks for your contributions!
✤ All of these were community contributed:!
✤ index.html main page!
✤ saving and retrieving job configuration!
✤ Your contributions are very welcome on Github!
Thank you!
And Everybody is Hiring!!

More Related Content

What's hot

Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformLegacy Typesafe (now Lightbend)
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarDatabricks
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaDatabricks
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Chris Fregly
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesLightbend
 
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the JobAkka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the JobLightbend
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesAlexis Seigneurin
 
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...confluent
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsLightbend
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Actor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java EnterpriseActor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java EnterpriseAlexander Lukyanchikov
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsDataWorks Summit/Hadoop Summit
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaExploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaLightbend
 
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Spark Summit
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenDatabricks
 

What's hot (20)

Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the JobAkka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and Microservices
 
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka Streams
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Actor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java EnterpriseActor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java Enterprise
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
 
Debugging Apache Spark
Debugging Apache SparkDebugging Apache Spark
Debugging Apache Spark
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaExploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 

Viewers also liked

Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanSpark Summit
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetupjlacefie
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014gethue
 
MongoDB very basic (Japanese) / MongoDB基礎の基礎
MongoDB very basic (Japanese) / MongoDB基礎の基礎MongoDB very basic (Japanese) / MongoDB基礎の基礎
MongoDB very basic (Japanese) / MongoDB基礎の基礎Naruhiko Ogasawara
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecLoïc Descotte
 
業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法Yoshitaka Mori
 
2014 11-20 Machine Learning with Apache Spark 勉強会資料
2014 11-20 Machine Learning with Apache Spark 勉強会資料2014 11-20 Machine Learning with Apache Spark 勉強会資料
2014 11-20 Machine Learning with Apache Spark 勉強会資料Recruit Technologies
 
業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法Co-graph Inc.
 
[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門Co-graph Inc.
 
Elasticsearchインデクシングのパフォーマンスを測ってみた
Elasticsearchインデクシングのパフォーマンスを測ってみたElasticsearchインデクシングのパフォーマンスを測ってみた
Elasticsearchインデクシングのパフォーマンスを測ってみたRyoji Kurosawa
 
Casual Compression on MongoDB
Casual Compression on MongoDBCasual Compression on MongoDB
Casual Compression on MongoDBmoai kids
 
MongoDBではじめるカジュアルなタイムラインシステム
MongoDBではじめるカジュアルなタイムラインシステムMongoDBではじめるカジュアルなタイムラインシステム
MongoDBではじめるカジュアルなタイムラインシステムHitoshi Asai
 
Consul: Service-oriented at Scale
Consul: Service-oriented at ScaleConsul: Service-oriented at Scale
Consul: Service-oriented at ScaleC4Media
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Spark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingSpark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingDemi Ben-Ari
 
Pixie dust overview
Pixie dust overviewPixie dust overview
Pixie dust overviewDavid Taieb
 
[CB16] 難解なウェブアプリケーションの脆弱性 by Andrés Riancho
[CB16] 難解なウェブアプリケーションの脆弱性 by Andrés Riancho[CB16] 難解なウェブアプリケーションの脆弱性 by Andrés Riancho
[CB16] 難解なウェブアプリケーションの脆弱性 by Andrés RianchoCODE BLUE
 

Viewers also liked (20)

Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
 
MongoDB3.2の紹介
MongoDB3.2の紹介MongoDB3.2の紹介
MongoDB3.2の紹介
 
Scala in practice
Scala in practiceScala in practice
Scala in practice
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetup
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
 
MongoDB very basic (Japanese) / MongoDB基礎の基礎
MongoDB very basic (Japanese) / MongoDB基礎の基礎MongoDB very basic (Japanese) / MongoDB基礎の基礎
MongoDB very basic (Japanese) / MongoDB基礎の基礎
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar Prokopec
 
業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法
 
2014 11-20 Machine Learning with Apache Spark 勉強会資料
2014 11-20 Machine Learning with Apache Spark 勉強会資料2014 11-20 Machine Learning with Apache Spark 勉強会資料
2014 11-20 Machine Learning with Apache Spark 勉強会資料
 
業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法
 
[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門
 
Elasticsearchインデクシングのパフォーマンスを測ってみた
Elasticsearchインデクシングのパフォーマンスを測ってみたElasticsearchインデクシングのパフォーマンスを測ってみた
Elasticsearchインデクシングのパフォーマンスを測ってみた
 
Casual Compression on MongoDB
Casual Compression on MongoDBCasual Compression on MongoDB
Casual Compression on MongoDB
 
MongoDBではじめるカジュアルなタイムラインシステム
MongoDBではじめるカジュアルなタイムラインシステムMongoDBではじめるカジュアルなタイムラインシステム
MongoDBではじめるカジュアルなタイムラインシステム
 
Consul: Service-oriented at Scale
Consul: Service-oriented at ScaleConsul: Service-oriented at Scale
Consul: Service-oriented at Scale
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Spark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingSpark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computing
 
Pixie dust overview
Pixie dust overviewPixie dust overview
Pixie dust overview
 
[CB16] 難解なウェブアプリケーションの脆弱性 by Andrés Riancho
[CB16] 難解なウェブアプリケーションの脆弱性 by Andrés Riancho[CB16] 難解なウェブアプリケーションの脆弱性 by Andrés Riancho
[CB16] 難解なウェブアプリケーションの脆弱性 by Andrés Riancho
 

Similar to Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)

Intro to node.js - Ran Mizrahi (27/8/2014)
Intro to node.js - Ran Mizrahi (27/8/2014)Intro to node.js - Ran Mizrahi (27/8/2014)
Intro to node.js - Ran Mizrahi (27/8/2014)Ran Mizrahi
 
Intro to node.js - Ran Mizrahi (28/8/14)
Intro to node.js - Ran Mizrahi (28/8/14)Intro to node.js - Ran Mizrahi (28/8/14)
Intro to node.js - Ran Mizrahi (28/8/14)Ran Mizrahi
 
Solid And Sustainable Development in Scala
Solid And Sustainable Development in ScalaSolid And Sustainable Development in Scala
Solid And Sustainable Development in ScalaKazuhiro Sera
 
Spark and scala reference architecture
Spark and scala reference architectureSpark and scala reference architecture
Spark and scala reference architectureAdrian Tanase
 
Solid and Sustainable Development in Scala
Solid and Sustainable Development in ScalaSolid and Sustainable Development in Scala
Solid and Sustainable Development in Scalascalaconfjp
 
What is Mean Stack Development ?
What is Mean Stack Development ?What is Mean Stack Development ?
What is Mean Stack Development ?Balajihope
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraDataStax Academy
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...Tim Vaillancourt
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Databricks
 
Building production websites with Node.js on the Microsoft stack
Building production websites with Node.js on the Microsoft stackBuilding production websites with Node.js on the Microsoft stack
Building production websites with Node.js on the Microsoft stackCellarTracker
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the SurfaceJosi Aranda
 
Harnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with GroovyHarnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with GroovySteve Pember
 

Similar to Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14) (20)

Intro to node.js - Ran Mizrahi (27/8/2014)
Intro to node.js - Ran Mizrahi (27/8/2014)Intro to node.js - Ran Mizrahi (27/8/2014)
Intro to node.js - Ran Mizrahi (27/8/2014)
 
Intro to node.js - Ran Mizrahi (28/8/14)
Intro to node.js - Ran Mizrahi (28/8/14)Intro to node.js - Ran Mizrahi (28/8/14)
Intro to node.js - Ran Mizrahi (28/8/14)
 
Solid And Sustainable Development in Scala
Solid And Sustainable Development in ScalaSolid And Sustainable Development in Scala
Solid And Sustainable Development in Scala
 
Spark and scala reference architecture
Spark and scala reference architectureSpark and scala reference architecture
Spark and scala reference architecture
 
Solid and Sustainable Development in Scala
Solid and Sustainable Development in ScalaSolid and Sustainable Development in Scala
Solid and Sustainable Development in Scala
 
What is Mean Stack Development ?
What is Mean Stack Development ?What is Mean Stack Development ?
What is Mean Stack Development ?
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
Apache spark
Apache sparkApache spark
Apache spark
 
ArangoDB
ArangoDBArangoDB
ArangoDB
 
Mean stack
Mean stackMean stack
Mean stack
 
Oracle application container cloud back end integration using node final
Oracle application container cloud back end integration using node finalOracle application container cloud back end integration using node final
Oracle application container cloud back end integration using node final
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
 
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
 
Intro to Sails.js
Intro to Sails.jsIntro to Sails.js
Intro to Sails.js
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
 
Building production websites with Node.js on the Microsoft stack
Building production websites with Node.js on the Microsoft stackBuilding production websites with Node.js on the Microsoft stack
Building production websites with Node.js on the Microsoft stack
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 
Harnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with GroovyHarnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with Groovy
 

More from Evan Chan

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesEvan Chan
 
Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Evan Chan
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web ServiceEvan Chan
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkEvan Chan
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureEvan Chan
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and SparkEvan Chan
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkEvan Chan
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkEvan Chan
 

More from Evan Chan (15)

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and Kubernetes
 
Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and Shark
 

Recently uploaded

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)

  • 1. Date Spark Job Server Evan Chan and Kelvin Chu
  • 2. Overview • REST API for Spark jobs and contexts. Easily operate Spark from any language or environment. • Runs jobs in their own Contexts or share 1 context amongst jobs • Great for sharing cached RDDs across jobs and low-latency jobs • Works with Standalone, Mesos, any Spark config • Jars, job history and config are persisted via a pluggable API • Async and sync API, JSON job results
  • 5. CONFIDENTIAL—DO NOT DISTRIBUTE 5 Founded in 2007 Commercially launched in 2009 300+ employees in Silicon Valley, LA, NYC, 
 London, Paris, Tokyo, Sydney & Guadalajara Global footprint, 200M unique users,
110+ countries, and more than 6,000 websites Over 1 billion videos played per month 
and 2 billion analytic events per day 25% of U.S. online viewers watch video 
 powered by Ooyala Ooyala, Inc.
  • 6. Spark at Ooyala • Started investing in Spark beginning of 2013 • Developers loved it, promise of a unifying platform • 2 teams of developers building on Spark • Actively contributing to the Spark community • Largest Spark cluster has > 100 nodes • Spark community very active, huge amount of interest
  • 7. From raw logs to fast queries Processing C*
 columnar store Raw Log Files Raw Log Files Raw Log Files Spark Spark Spark View 1 View 2 View 3 Spark Shark Predefined queries Ad-hoc HiveQL
  • 9. WhyWe Needed a Job Server • Our vision for Spark is as a multi-team big data service • What gets repeated by every team: • Bastion box for running Hadoop/Spark jobs • Deploys and process monitoring • Tracking and serializing job status, progress, and job results • Job validation • No easy way to kill jobs • Polyglot technology stack - Ruby scripts run jobs, Go services
  • 11. Creating a Job Server Project ✤ sbt assembly -> fat jar -> upload to job server! ✤ "provided" is used. Don’t want SBT assembly to include the whole job server jar.! ✤ Java projects should be possible too resolvers += "Ooyala Bintray" at "http://dl.bintray.com/ooyala/maven" ! libraryDependencies += "ooyala.cnd" % "job-server" % "0.3.1" % "provided" ✤ In your build.sbt, add this
  • 12. Example Job Server Job /**! * A super-simple Spark job example that implements the SparkJob trait and! * can be submitted to the job server.! */! object WordCountExample extends SparkJob {! override def validate(sc: SparkContext, config: Config): SparkJobValidation = {! Try(config.getString(“input.string”))! .map(x => SparkJobValid)! .getOrElse(SparkJobInvalid(“No input.string”))! }! ! override def runJob(sc: SparkContext, config: Config): Any = {! val dd = sc.parallelize(config.getString(“input.string”).split(" ").toSeq)! dd.map((_, 1)).reduceByKey(_ + _).collect().toMap! }! }!
  • 13. What’s Different? • Job does not create Context, Job Server does • Decide when I run the job: in own context, or in pre-created context • Upload new jobs to diagnose your RDD issues: • POST /contexts/newContext • POST /jobs .... context=newContext • Upload a new diagnostic jar... POST /jars/newDiag • Run diagnostic jar to dump into on cached RDDs
  • 14. Submitting and Running a Job ✦ curl --data-binary @../target/mydemo.jar localhost:8090/jars/demo OK[11:32 PM] ~ ! ✦ curl -d "input.string = A lazy dog jumped mean dog" 'localhost:8090/jobs? appName=demo&classPath=WordCountExample&sync=true' { "status": "OK", "RESULT": { "lazy": 1, "jumped": 1, "A": 1, "mean": 1, "dog": 2 } }
  • 15. Retrieve Job Statuses ~/s/jobserver (evan-working-1 ↩=) curl 'localhost:8090/jobs?limit=2' [{ "duration": "77.744 secs", "classPath": "ooyala.cnd.CreateMaterializedView", "startTime": "2013-11-26T20:13:09.071Z", "context": "8b7059dd-ooyala.cnd.CreateMaterializedView", "status": "FINISHED", "jobId": "9982f961-aaaa-4195-88c2-962eae9b08d9" }, { "duration": "58.067 secs", "classPath": "ooyala.cnd.CreateMaterializedView", "startTime": "2013-11-26T20:22:03.257Z", "context": "d0a5ebdc-ooyala.cnd.CreateMaterializedView", "status": "FINISHED", "jobId": "e9317383-6a67-41c4-8291-9c140b6d8459" }]
  • 16. Use Case: Fast Query Jobs
  • 17. Spark as a Query Engine ✤ Goal: spark jobs that run in under a second and answers queries on shared RDD data! ✤ Query params passed in as job config! ✤ Need to minimize context creation overhead! ✤ Thus many jobs sharing the same SparkContext! ✤ On-heap RDD caching means no serialization loss! ✤ Need to consider concurrent jobs (fair scheduling)
  • 18. LOW-LATENCY QUERY JOBS RDDLoad Data Query Job Spark
 Executors Cassandra REST Job Server Query Job Query Result Query Result new SparkContext Create query context Load some data
  • 19. Sharing Data Between Jobs ✤ RDD Caching! ✤ Benefit: no need to serialize data. Especially useful for indexes etc.! ✤ Job server provides a NamedRdds trait for thread-safe CRUD of cached RDDs by name! ✤ (Compare to SparkContext’s API which uses an integer ID and is not thread safe)! ✤ For example, at Ooyala a number of fields are multiplexed into the RDD name: timestamp:customerID:granularity
  • 20. Data Concurrency ✤ Single writer, multiple readers! ✤ Managing multiple updates to RDDs! ✤ Cache keeps track of which RDDs being updated! ✤ Example: thread A spark job creates RDD “A” at t0! ✤ thread B fetches RDD “A” at t1 > t0! ✤ Both threads A and B, using NamedRdds, will get the RDD at time t2 when thread A finishes creating the RDD “A”
  • 21. UsingTachyon Pros Cons Off-heap storage: No GC ByteBuffer API - need to pay deserialization cost Can be shared across multiple processes Data can survive process loss Backed by HDFS Does not support random access writes
  • 23. Completely Async Design ✤ http://spray.io - probably the fastest JVM HTTP microframework! ✤ Akka Actor based, non blocking! ✤ Futures used to manage individual jobs. (Note that Spark is using Scala futures to manage job stages now)! ✤ Single JVM for now, but easy to distribute later via remote Actors / Akka Cluster
  • 24. Async Actor Flow Spray web API Request actor Local Supervisor Job Manager Job 1 Future Job 2 Future Job Status Actor Job Result Actor
  • 25. Message flow fully documented
  • 27. Metadata Store ✤ JarInfo, JobInfo, ConfigInfo! ✤ JobSqlDAO. Store metadata to SQL database by JDBC interface.! ✤ Easily configured by spark.sqldao.jdbc.url! ✤ jdbc:mysql://dbserver:3306/jobserverdb ✤ Multiple Job Servers can share the same MySQL.! ✤ Jars uploaded once but accessible by all servers.! ✤ The default will be JobSqlDAO and H2.! ✤ Single H2 DB file. Serialization and deserialization are handled by H2.
  • 28. Deployment and Metrics ✤ spark-jobserver repo comes with a full suite of tests and deploy scripts:! ✤ server_deploy.sh for regular server pushes! ✤ server_package.sh for Mesos and Chronos .tar.gz! ✤ /metricz route for codahale-metrics monitoring! ✤ /healthz route for health check0o
  • 29. Challenges and Lessons • Spark is based around contexts - we need a Job Server oriented around logical jobs • Running multiple SparkContexts in the same process • Global use of System properties makes it impossible to start multiple contexts at same time (but see pull request...) • Have to be careful with SparkEnv • Dynamic jar and class loading is tricky • Manage threads carefully - each context uses lots of threads
  • 31. Future Plans ✤ Spark-contrib project list. So this and other projects can gain visibility! (SPARK-1283)! ✤ HA mode using Akka Cluster or Mesos! ✤ HA and Hot Failover for Spark Drivers/Contexts! ✤ REST API for job progress! ✤ Swagger API documentation
  • 32. HA and Hot Failover for Jobs Job Server 1 Job Server 2 Active Job Context HDFS Standby Job Context Gossip Checkpoint ✤ Job context dies:! ✤ Job server 2 notices and spins up standby context, restores checkpoint
  • 33. Thanks for your contributions! ✤ All of these were community contributed:! ✤ index.html main page! ✤ saving and retrieving job configuration! ✤ Your contributions are very welcome on Github!