SlideShare a Scribd company logo
1 of 35
Download to read offline
Distributed Systems from
Scratch - Part 1
Motivation and Introduction to Apache Mesos
https://github.com/phatak-dev/distributedsystems
● Madhukara Phatak
● Big data consultant and
trainer at datamantra.io
● Consult in Hadoop, Spark
and Scala
● www.madhukaraphatak.com
Agenda
● Idea
● Motivation
● Architecture of existing big data system
● What we want to build?
● Introduction to Apache Mesos
● Distributed Shell
● Function API
● Custom executor
Idea
“What it takes to build a
distributed processing system
like Spark?”
Motivation
● First version of Spark only had 1600 lines of Scala code
● Had all basic pieces of RDD and ability to run
distributed system using Mesos
● Recreating the same code with step by step
understanding
● Ample of time in hand
Distributed systems from 30000ft
Distributed Storage(HDFS/S3)
Distributed Cluster management
(YARN/Mesos)
Distributed Processing Systems
(Spark/MapReduce)
Data Applications
Standardization of frameworks
● Building a distributed processing system is like building
a web framework
● Already we have excellent underneath frameworks like
YARN,Mesos for cluster management and HDFS for
distributed storage
● We can build on these frameworks rather than trying to
do everything from scratch
● Most of third generation systems like Spark, Flink do the
same
Conventional wisdom
● To build distributed system you need to read complex
papers
● Understand the details of how distribution is done using
different protocols
● Need to care about complexities of concurrency ,
locking etc
● Need to do everything from scratch
Modern wisdom
● Read spark code to understand how to build a
distributed processing system
● Use Apache Mesos and YARN to tedious cluster
resource management
● Use AKKA to do distributed concurrency
● Use excellent proven frameworks rather inventing your
own
Why this talk in Spark meetup?
YARN/Mesos
Applications Experience sharing
Introduction sessions
Anatomy Sessions
Spark on YARN
Spark
Runtime
Data abstraction( RDD/ Dataframe)
API’s
Top down
approach
Top down approach
● We started discussing Spark API’s about using
introductory sessions like Spark batch, Spark streaming
● Once we understood the basic API’s, we have
discussed different abstraction layers like RDD,
Dataframe in our anatomy sessions
● We have also talked about spark runtime like data
sources in one of our anatomy session
● Last meetup we discussed cluster management in
session Spark on YARN
Bottom up approach
● Start at the cluster management layer using mesos and
YARN
● Build
○ Runtime
○ Abstractions
○ API’s
● Build application using our own abstractions and
runtime
● Use all we learnt in our top down approach
Design
● Heavily influenced by the way Apache Spark is built
● Lot of code and design comes from Spark code
● No dependency on the spark itself
● Only implements very basic distributed processing
pieces
● Make it work on Apache mesos and Apache YARN
● Process oriented not data oriented
Spark at it’s birth - 2010
● Only 1600 lines of Scala code
● Used Apache Mesos for cluster management
● Used Mesos messaging API for concurrency
management (no AKKA)
● Used scala functions as processing abstraction rather
than DAG
● No optimizations
Steps to get there
● Learn Apache Mesos
● Implement a simple hello world on Mesos
● Implement simple function oriented API on mesos
● Support third party libraries
● Support shuffle
● Support aggregations and counters
● Implement similar functionality on YARN
Apache Mesos
● Apache mesos is an open source cluster manager
● It "provides efficient resource isolation and sharing
across distributed applications, or frameworks
● Built at UC Berkeley
● YARN ideas are inspired by Mesos
● Written in C++
● Uses linux cgroups (aka Docker) for resource isolation
Why Mesos?
● Abstracts out the managing resources from processing
application
● Handles cluster setup and management
● With help of zookeeper, can provide master fault
tolerance
● Modular and simple API
● Supports different distributed processing systems on the
same cluster
● Provides API’s in multiple languages like C++,Java
Architecture of Mesos
Mesos Master
Mesos slave Mesos slave Mesos slave
Hadoop
Scheduler
Spark Scheduler
Hadoop
Executor
Spark
Executor
Custom
Framework
Custom
executor
Frameworks
Architecture of Mesos
● Mesos master - Single master node of the mesos
cluster. Entry point to any mesos application.
● Mesos slaves - Each machine in cluster runs mesos
slave which is responsible for running tasks
● Framework - Distributed Application build using Apache
Mesos API
○ Scheduler - Entrypoint to framework. Responsible
for launching tasks
○ Executor - Runs actual tasks on mesos slaves
Starting mesos
● Starting master
bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/mesos
● Starting slave
bin/mesos-slave.sh --master=127.0.0.1:5050
● Accessing UI
http://127.0.0.1:5050
● http://blog.madhukaraphatak.com/mesos-single-node-
setup-ubuntu/
Hello world on Mesos
● Run a simple shell command in each mesos slave
● We create our own framework which is capable of
running shell commands
● Our framework should these three following
components
○ Client
○ Scheduler
○ Executor
Client
● Code that submits the tasks to the framework
● Task is an abstraction used by mesos to indicate any
piece of work which takes some resources.
● It’s similar to driver program in Spark
● It create an instance of the framework and submits to
mesos driver
● Mesos uses protocol buffer for serialization
● Example code
DistributedShell.scala
Scheduler
● Every framework in the apache mesos, should extend
the scheduler interface
● Scheduler is the entry point for our custom framework
● It’s similar to Sparkcontext
● We need to override
○ resourceoffers
● It acts like Application master from the YARN
Offers
● Each resource in the mesos is offered as the offer
● Whenever there is resource (disk,memory and cpu)
mesos offers it to all the frameworks running on it
● A framework can accept the offer and use it for running
it’s own tasks
● Once execution is done, it can release that resource so
that mesos can offer to other framework
● Quite different than the YARN model
Executor
● Once a framework receives the offer, it has to specify
the executor which actually run a piece of code on work
nodes
● Executor sets up environment to run each task given by
client
● Scheduler uses this executor to run each task
● In our distributed shell example, we use the default
executor provided by the mesos
Task
● Task is an abstraction used by mesos to indicate any
piece of work which takes some resources.
● It’s basic unit of computation of processing on mesos
● It has
○ Id
○ Offer (resources)
○ Executor
○ Slave Id - machine on which it’s has to run
Scala Scheduler example
Running hello world
● java -cp target/scala-2.11/distrubutedsystemfromscratch_2.11-1.0.jar -
Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak.
mesos.helloworld.DistributedShell "/bin/echo hello"
● Mesos needs the it’s library *.so files in the classpath to
connect to the mesos cluster
● Once execution is done, we can look at the all tasks ran
for a given framework from mesos UI
● Let’s look the ones for our distributed shell application
Custom executor
● In last example, we ran shell commands
● What if we want to run some custom code which is of
the type of Java/Scala?
● We need to define our own executor which setups the
environment to run the code rather than using the built
in command executor
● Executors are the way mesos supports the ability
different language frameworks on same cluster
Defining function task API
● We are going to define an abstraction of tasks which
wraps a simple scala function
● This allows to run any given pure scala function on large
cluster
● This is the spark started to support distributed
processing for it’s rdd in the initial implementation
● This task will extend the serializable which allows us to
serialize the function over network
● Example : Task.scala
Task scheduler
● Similar to earlier scheduler but uses custom executor
rather default one
● Creates the TaskInfo object which contains
○ Offer
○ Executor
○ Serialized function as data
● getExecutorInfo uses custom script to launch our own
TaskExecutor
● TaskScheduler.scala
Task executor
● Task executor is our custom executor which is capable
of running our function tasks
● It creates an instance of mesos executor and overrides
launchTask
● It deserializes the task from the task info object which
was sent by the task scheduler
● Once it deserializes the object, it runs that function in
that machine
● Example : TaskExecutor.scala
CustomTasks
● Once we everything in place, we can run any scala
function in the distributed manner now.
● We can create different kind of scala functions and wrap
inside our function task abstraction
● In our client, we create multiple tasks and submit to the
task scheduler
● Observe that the API also supports the closures
● Example : CustomTasks.scala
Running custom executor
● java -cp target/scala-2.11/DistrubutedSystemFromSatch-assembly-1.0.jar -
Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak.
mesos.customexecutor.CustomTasks localhost:5050
/home/madhu/Dev/mybuild/DistrubutedSystemFromScratch/src/main/resou
rces/run-executor.sh
● We are passing the script which has the environment to launch our custom
executor
● In our example, we are using local file system. You can use the hdfs for the
same
References
● http://blog.madhukaraphatak.com/mesos-single-node-
setup-ubuntu/
● http://blog.madhukaraphatak.com/mesos-helloworld-
scala/
● http://blog.madhukaraphatak.com/custom-mesos-
executor-scala/

More Related Content

What's hot

Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streamingdatamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Sparkdatamantra
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 APIdatamantra
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLdatamantra
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark applicationdatamantra
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Sparkdatamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streamingdatamantra
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to datasetdatamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafkadatamantra
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0datamantra
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streamingdatamantra
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streamingdatamantra
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streamingdatamantra
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2datamantra
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Executiondatamantra
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0datamantra
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark MLdatamantra
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Telliusdatamantra
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2datamantra
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPdatamantra
 

What's hot (20)

Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Spark
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
 

Viewers also liked

Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in ScalaAlex Payne
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosPaco Nathan
 
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Aysylu Greenberg
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_sparkYiguang Hu
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewKrishna-Kumar
 
IoT 공통 보안가이드
IoT 공통 보안가이드IoT 공통 보안가이드
IoT 공통 보안가이드봉조 김
 
(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회봉조 김
 
4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서봉조 김
 
2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료봉조 김
 
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 24.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2봉조 김
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcareTaposh Roy
 
Ranking the Web with Spark
Ranking the Web with SparkRanking the Web with Spark
Ranking the Web with SparkSylvain Zimmer
 
Keyboard covert channels
Keyboard covert channelsKeyboard covert channels
Keyboard covert channelsFreeman Zhang
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streamingdatamantra
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Introjeykottalam
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2datamantra
 

Viewers also liked (20)

Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
 
Introduction to mesos
Introduction to mesosIntroduction to mesos
Introduction to mesos
 
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
 
IoT 공통 보안가이드
IoT 공통 보안가이드IoT 공통 보안가이드
IoT 공통 보안가이드
 
(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회
 
4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서
 
2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료
 
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 24.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcare
 
Ranking the Web with Spark
Ranking the Web with SparkRanking the Web with Spark
Ranking the Web with Spark
 
Keyboard covert channels
Keyboard covert channelsKeyboard covert channels
Keyboard covert channels
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
 

Similar to Building Distributed Systems from Scratch - Part 1

Apache spark - Installation
Apache spark - InstallationApache spark - Installation
Apache spark - InstallationMartin Zapletal
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersAnant Corporation
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Olalekan Fuad Elesin
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark InternalsKnoldus Inc.
 
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Anant Corporation
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps RevolutionYulian Slobodyan
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark exampleShidrokhGoudarzi1
 
Docker, Mesos, Spark
Docker, Mesos, Spark Docker, Mesos, Spark
Docker, Mesos, Spark Qiang Wang
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark TutorialAhmet Bulut
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetesdatamantra
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Jyotasana Bharti
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Spark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingSpark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingDemi Ben-Ari
 

Similar to Building Distributed Systems from Scratch - Part 1 (20)

Apache spark - Installation
Apache spark - InstallationApache spark - Installation
Apache spark - Installation
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource Managers
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
 
internals
internalsinternals
internals
 
Internals
InternalsInternals
Internals
 
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps Revolution
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
 
Docker, Mesos, Spark
Docker, Mesos, Spark Docker, Mesos, Spark
Docker, Mesos, Spark
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Modern web technologies
Modern web technologiesModern web technologies
Modern web technologies
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Spark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingSpark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computing
 

More from datamantra

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streamingdatamantra
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetesdatamantra
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsdatamantra
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle managementdatamantra
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scaladatamantra
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scaladatamantra
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsdatamantra
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scaladatamantra
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scaledatamantra
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientistsdatamantra
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalystdatamantra
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIdatamantra
 

More from datamantra (13)

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
 

Recently uploaded

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 

Recently uploaded (20)

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 

Building Distributed Systems from Scratch - Part 1

  • 1. Distributed Systems from Scratch - Part 1 Motivation and Introduction to Apache Mesos https://github.com/phatak-dev/distributedsystems
  • 2. ● Madhukara Phatak ● Big data consultant and trainer at datamantra.io ● Consult in Hadoop, Spark and Scala ● www.madhukaraphatak.com
  • 3. Agenda ● Idea ● Motivation ● Architecture of existing big data system ● What we want to build? ● Introduction to Apache Mesos ● Distributed Shell ● Function API ● Custom executor
  • 4. Idea “What it takes to build a distributed processing system like Spark?”
  • 5. Motivation ● First version of Spark only had 1600 lines of Scala code ● Had all basic pieces of RDD and ability to run distributed system using Mesos ● Recreating the same code with step by step understanding ● Ample of time in hand
  • 6. Distributed systems from 30000ft Distributed Storage(HDFS/S3) Distributed Cluster management (YARN/Mesos) Distributed Processing Systems (Spark/MapReduce) Data Applications
  • 7. Standardization of frameworks ● Building a distributed processing system is like building a web framework ● Already we have excellent underneath frameworks like YARN,Mesos for cluster management and HDFS for distributed storage ● We can build on these frameworks rather than trying to do everything from scratch ● Most of third generation systems like Spark, Flink do the same
  • 8. Conventional wisdom ● To build distributed system you need to read complex papers ● Understand the details of how distribution is done using different protocols ● Need to care about complexities of concurrency , locking etc ● Need to do everything from scratch
  • 9. Modern wisdom ● Read spark code to understand how to build a distributed processing system ● Use Apache Mesos and YARN to tedious cluster resource management ● Use AKKA to do distributed concurrency ● Use excellent proven frameworks rather inventing your own
  • 10. Why this talk in Spark meetup? YARN/Mesos Applications Experience sharing Introduction sessions Anatomy Sessions Spark on YARN Spark Runtime Data abstraction( RDD/ Dataframe) API’s Top down approach
  • 11. Top down approach ● We started discussing Spark API’s about using introductory sessions like Spark batch, Spark streaming ● Once we understood the basic API’s, we have discussed different abstraction layers like RDD, Dataframe in our anatomy sessions ● We have also talked about spark runtime like data sources in one of our anatomy session ● Last meetup we discussed cluster management in session Spark on YARN
  • 12. Bottom up approach ● Start at the cluster management layer using mesos and YARN ● Build ○ Runtime ○ Abstractions ○ API’s ● Build application using our own abstractions and runtime ● Use all we learnt in our top down approach
  • 13. Design ● Heavily influenced by the way Apache Spark is built ● Lot of code and design comes from Spark code ● No dependency on the spark itself ● Only implements very basic distributed processing pieces ● Make it work on Apache mesos and Apache YARN ● Process oriented not data oriented
  • 14. Spark at it’s birth - 2010 ● Only 1600 lines of Scala code ● Used Apache Mesos for cluster management ● Used Mesos messaging API for concurrency management (no AKKA) ● Used scala functions as processing abstraction rather than DAG ● No optimizations
  • 15. Steps to get there ● Learn Apache Mesos ● Implement a simple hello world on Mesos ● Implement simple function oriented API on mesos ● Support third party libraries ● Support shuffle ● Support aggregations and counters ● Implement similar functionality on YARN
  • 16. Apache Mesos ● Apache mesos is an open source cluster manager ● It "provides efficient resource isolation and sharing across distributed applications, or frameworks ● Built at UC Berkeley ● YARN ideas are inspired by Mesos ● Written in C++ ● Uses linux cgroups (aka Docker) for resource isolation
  • 17. Why Mesos? ● Abstracts out the managing resources from processing application ● Handles cluster setup and management ● With help of zookeeper, can provide master fault tolerance ● Modular and simple API ● Supports different distributed processing systems on the same cluster ● Provides API’s in multiple languages like C++,Java
  • 18. Architecture of Mesos Mesos Master Mesos slave Mesos slave Mesos slave Hadoop Scheduler Spark Scheduler Hadoop Executor Spark Executor Custom Framework Custom executor Frameworks
  • 19. Architecture of Mesos ● Mesos master - Single master node of the mesos cluster. Entry point to any mesos application. ● Mesos slaves - Each machine in cluster runs mesos slave which is responsible for running tasks ● Framework - Distributed Application build using Apache Mesos API ○ Scheduler - Entrypoint to framework. Responsible for launching tasks ○ Executor - Runs actual tasks on mesos slaves
  • 20. Starting mesos ● Starting master bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/mesos ● Starting slave bin/mesos-slave.sh --master=127.0.0.1:5050 ● Accessing UI http://127.0.0.1:5050 ● http://blog.madhukaraphatak.com/mesos-single-node- setup-ubuntu/
  • 21. Hello world on Mesos ● Run a simple shell command in each mesos slave ● We create our own framework which is capable of running shell commands ● Our framework should these three following components ○ Client ○ Scheduler ○ Executor
  • 22. Client ● Code that submits the tasks to the framework ● Task is an abstraction used by mesos to indicate any piece of work which takes some resources. ● It’s similar to driver program in Spark ● It create an instance of the framework and submits to mesos driver ● Mesos uses protocol buffer for serialization ● Example code DistributedShell.scala
  • 23. Scheduler ● Every framework in the apache mesos, should extend the scheduler interface ● Scheduler is the entry point for our custom framework ● It’s similar to Sparkcontext ● We need to override ○ resourceoffers ● It acts like Application master from the YARN
  • 24. Offers ● Each resource in the mesos is offered as the offer ● Whenever there is resource (disk,memory and cpu) mesos offers it to all the frameworks running on it ● A framework can accept the offer and use it for running it’s own tasks ● Once execution is done, it can release that resource so that mesos can offer to other framework ● Quite different than the YARN model
  • 25. Executor ● Once a framework receives the offer, it has to specify the executor which actually run a piece of code on work nodes ● Executor sets up environment to run each task given by client ● Scheduler uses this executor to run each task ● In our distributed shell example, we use the default executor provided by the mesos
  • 26. Task ● Task is an abstraction used by mesos to indicate any piece of work which takes some resources. ● It’s basic unit of computation of processing on mesos ● It has ○ Id ○ Offer (resources) ○ Executor ○ Slave Id - machine on which it’s has to run
  • 28. Running hello world ● java -cp target/scala-2.11/distrubutedsystemfromscratch_2.11-1.0.jar - Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak. mesos.helloworld.DistributedShell "/bin/echo hello" ● Mesos needs the it’s library *.so files in the classpath to connect to the mesos cluster ● Once execution is done, we can look at the all tasks ran for a given framework from mesos UI ● Let’s look the ones for our distributed shell application
  • 29. Custom executor ● In last example, we ran shell commands ● What if we want to run some custom code which is of the type of Java/Scala? ● We need to define our own executor which setups the environment to run the code rather than using the built in command executor ● Executors are the way mesos supports the ability different language frameworks on same cluster
  • 30. Defining function task API ● We are going to define an abstraction of tasks which wraps a simple scala function ● This allows to run any given pure scala function on large cluster ● This is the spark started to support distributed processing for it’s rdd in the initial implementation ● This task will extend the serializable which allows us to serialize the function over network ● Example : Task.scala
  • 31. Task scheduler ● Similar to earlier scheduler but uses custom executor rather default one ● Creates the TaskInfo object which contains ○ Offer ○ Executor ○ Serialized function as data ● getExecutorInfo uses custom script to launch our own TaskExecutor ● TaskScheduler.scala
  • 32. Task executor ● Task executor is our custom executor which is capable of running our function tasks ● It creates an instance of mesos executor and overrides launchTask ● It deserializes the task from the task info object which was sent by the task scheduler ● Once it deserializes the object, it runs that function in that machine ● Example : TaskExecutor.scala
  • 33. CustomTasks ● Once we everything in place, we can run any scala function in the distributed manner now. ● We can create different kind of scala functions and wrap inside our function task abstraction ● In our client, we create multiple tasks and submit to the task scheduler ● Observe that the API also supports the closures ● Example : CustomTasks.scala
  • 34. Running custom executor ● java -cp target/scala-2.11/DistrubutedSystemFromSatch-assembly-1.0.jar - Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak. mesos.customexecutor.CustomTasks localhost:5050 /home/madhu/Dev/mybuild/DistrubutedSystemFromScratch/src/main/resou rces/run-executor.sh ● We are passing the script which has the environment to launch our custom executor ● In our example, we are using local file system. You can use the hdfs for the same