Boulder/Denver BigData 2013: Cluster Computing with Mesos and Cascading

Boulder/Denver BigData, 2013-09-25:
Cluster Computing
with Apache Mesos and Cascading
Paco Nathan @pacoid
Chief Scientist, Mesosphere.io

Cluster Computing
with Apache Mesos and Cascading:
1. Enterprise Data Workﬂows
2. Lingual and Pattern Examples
3. An Evolution of Cluster Computing
Boulder, 2013-09-25

Enterprise Data Workﬂows
middleware for Big Data applications is evolving,
with commercial examples that include:
Cascading, Lingual, Pattern, etc.
Concurrent
ParAccel Big Data Analytics Platform
Actian
Anaconda supporting IPython Notebook, Pandas,Augustus, etc.
Continuum Analytics
ETL
data
prep
predictive
model
data
sources
end
uses

Anatomy of an Enterprise app
deﬁnition of a typical Enterprise workﬂow which crosses through
multiple departments, languages, and technologies…
ETL
data
prep
predictive
model
data
sources
end
uses
ANSI SQL for ETL

ETL
data
prep
predictive
model
data
sources
end
usesJ2EE for business logic

ETL
data
prep
predictive
model
data
sources
end
uses
SAS for predictive models

ETL
data
prep
predictive
model
data
sources
end
uses
SAS for predictive modelsANSI SQL for ETL most of the licensing costs…

ETL
data
prep
predictive
model
data
sources
end
usesJ2EE for business logic
most of the project costs…

ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
SAS, R, etc. → PMML
business logic in Java,
Clojure, Scala, etc.
sink taps for
Memcached, HBase,
MongoDB, etc.
source taps for
Cassandra, JDBC,
Splunk, etc.
Cascading allows multiple departments to combine their workﬂow components
into an integrated app – one among many, typically – based on 100% open source
a compiler sees it all…
one connected DAG:
• optimization
• troubleshooting
• exception handling
• notiﬁcations
cascading.org

ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
sink taps for
Memcached, HBase,
MongoDB, etc.
source taps for
Cassandra, JDBC,
Splunk, etc.
FlowDef flowDef = FlowDef.flowDef()
.setName( "etl" )
.addSource( "example.employee", emplTap )
.addSource( "example.sales", salesTap )
.addSink( "results", resultsTap );

SQLPlanner sqlPlanner = new SQLPlanner()
.setSql( sqlStatement );

flowDef.addAssemblyPlanner( sqlPlanner );
cascading.org

ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
sink taps for
Memcached, HBase,
MongoDB, etc.
source taps for
Cassandra, JDBC,
Splunk, etc.
FlowDef flowDef = FlowDef.flowDef()
.setName( "classifier" )
.addSource( "input", inputTap )
.addSink( "classify", classifyTap );

PMMLPlanner pmmlPlanner = new PMMLPlanner()
.setPMMLInput( new File( pmmlModel ) )
.retainOnlyActiveIncomingFields();

flowDef.addAssemblyPlanner( pmmlPlanner );

Cascading – functional programming
Key insight: MapReduce is based on functional programming
– back to LISP in 1970s. Apache Hadoop use cases are
mostly about data pipelines, which are functional in nature.
to ease staffing problems as “Main Street” Enterprise firms
began to embrace Hadoop, Cascading was introduced
in late 2007, as a new Java API to implement functional
programming for large-scale data workflows:
• leverages JVM and Java-based tools without any
need to create new languages
• allows programmers who have J2EE expertise
to leverage the economics of Hadoop clusters
Edgar Codd alluded to this (DSLs for structuring data)
in his original paper about relational model

Cascading – functional programming
• Twitter, eBay, LinkedIn, Nokia, YieldBot, uSwitch, etc.,
have invested in open source projects atop Cascading –
used for their large-scale production deployments
• new case studies for Cascading apps are mostly based on
domain-speciﬁc languages (DSLs) in JVM languages which
emphasize functional programming:
Cascalog in Clojure (2010)
Scalding in Scala (2012)
github.com/nathanmarz/cascalog/wiki
github.com/twitter/scalding/wiki
Why Adopting the Declarative Programming PracticesWill ImproveYour Return fromTechnology
Dan Woods, 2013-04-17 Forbes
forbes.com/sites/danwoods/2013/04/17/why-adopting-the-declarative-programming-
practices-will-improve-your-return-from-technology/

Functional Programming for Big Data
WordCount with token scrubbing…
Apache Hive: 52 lines HQL + 8 lines Python (UDF)
compared to
Scalding: 18 lines Scala/Cascading
functional programming languages help reduce
software engineering costs at scale, over time

Cascading – deployments
• case studies: Climate Corp, Twitter, Etsy,
Williams-Sonoma, uSwitch, Airbnb, Nokia,
YieldBot, Square, Harvard, Factual, etc.
• use cases: ETL, marketing funnel, anti-fraud,
social media, retail pricing, search analytics,
recommenders, eCRM, utility grids, telecom,
genomics, climatology, agronomics, etc.

Workflow Abstraction – pattern language
Cascading uses a “plumbing” metaphor in Java
to define workflows out of familiar elements:
Pipes, Taps, Tuple Flows, Filters, Joins, Traps, etc.
Scrub
token
Document
Collection
Tokenize
Word
Count
GroupBy
token
Count
Stop Word
List
Regex
token
HashJoin
Left
RHS
M
R
data is represented as flows of tuples
operations in the flows bring functional
programming aspects into Java
A Pattern Language
Christopher Alexander, et al.
amazon.com/dp/0195019199

Workflow Abstraction – literate programming
Cascading workflows generate their own visual
documentation: flow diagrams
in formal terms, flow diagrams leverage a methodology
called literate programming
provides intuitive, visual representations for apps –
great for cross-team collaboration
Scrub
token
Document
Collection
Tokenize
Word
Count
GroupBy
token
Count
Stop Word
List
Regex
token
HashJoin
Left
RHS
M
R
Literate Programming
Don Knuth
literateprogramming.com

Workflow Abstraction – business process
following the essence of literate programming, Cascading
workflows provide statements of business process
this recalls a sense of business process management
for Enterprise apps (think BPM/BPEL for Big Data)
Cascading creates a separation of concerns between
business process and implementation details (Hadoop, etc.)
this is especially apparent in large-scale Cascalog apps:
“Specify what you require, not how to achieve it.”
by virtue of the pattern language, the flow planner then
determines how to translate business process into efficient,
parallel jobs at scale

void map (String doc_id, String text):
for each word w in segment(text):
emit(w, "1");
void reduce (String word, Iterator group):
int count = 0;
for each pc in group:
count += Int(pc);
emit(word, String(count));
The Ubiquitous Word Count
Definition:
this simple program provides an excellent test case
for parallel processing:
• requires a minimal amount of code
• demonstrates use of both symbolic and numeric values
• shows a dependency graph of tuples as an abstraction
• is not many steps away from useful search indexing
• serves as a “HelloWorld” for Hadoop apps
a distributed computing framework that runsWord Count
efficiently in parallel at scale can handle much larger
and more interesting compute problems
count how often each word appears
in a collection of text documents

Document
Collection
Word
Count
Tokenize
GroupBy
token Count
R
M
1 map
1 reduce
18 lines code gist.github.com/3900702
WordCount – conceptual ﬂow diagram
cascading.org/category/impatient

WordCount – Cascading app in Java
String docPath = args[ 0 ];
String wcPath = args[ 1 ];
Properties properties = new Properties();
AppProps.setApplicationJarClass( properties, Main.class );
HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );
// create source and sink taps
Tap docTap = new Hfs( new TextDelimited( true, "t" ), docPath );
Tap wcTap = new Hfs( new TextDelimited( true, "t" ), wcPath );
// specify a regex to split "document" text lines into token stream
Fields token = new Fields( "token" );
Fields text = new Fields( "text" );
RegexSplitGenerator splitter = new RegexSplitGenerator( token, "[ [](),.]" );
// only returns "token"
Pipe docPipe = new Each( "token", text, splitter, Fields.RESULTS );
// determine the word counts
Pipe wcPipe = new Pipe( "wc", docPipe );
wcPipe = new GroupBy( wcPipe, token );
wcPipe = new Every( wcPipe, Fields.ALL, new Count(), Fields.ALL );
// connect the taps, pipes, etc., into a flow
FlowDef flowDef = FlowDef.flowDef().setName( "wc" )
.addSource( docPipe, docTap )
.addTailSink( wcPipe, wcTap );
// write a DOT file and run the flow
Flow wcFlow = flowConnector.connect( flowDef );
wcFlow.writeDOT( "dot/wc.dot" );
wcFlow.complete();
Document
Collection
Word
Count
Tokenize
GroupBy
token Count
R
M

mapreduce
Every('wc')[Count[decl:'count']]
Hfs['TextDelimited[[UNKNOWN]->['token', 'count']]']['output/wc']']
GroupBy('wc')[by:['token']]
Each('token')[RegexSplitGenerator[decl:'token'][args:1]]
Hfs['TextDelimited[['doc_id', 'text']->[ALL]]']['data/rain.txt']']
[head]
[tail]
[{2}:'token', 'count']
[{1}:'token']
[{2}:'doc_id', 'text']
[{2}:'doc_id', 'text']
wc[{1}:'token']
[{1}:'token']
[{1}:'token']
[{1}:'token']
WordCount – generated ﬂow diagram
Document
Collection
Word
Count
Tokenize
GroupBy
token Count
R
M

(ns impatient.core
  (:use [cascalog.api]
        [cascalog.more-taps :only (hfs-delimited)])
  (:require [clojure.string :as s]
            [cascalog.ops :as c])
  (:gen-class))
(defmapcatop split [line]
  "reads in a line of string and splits it by regex"
  (s/split line #"[[](),.)s]+"))
(defn -main [in out & args]
  (?<- (hfs-delimited out)
       [?word ?count]
       ((hfs-delimited in :skip-header? true) _ ?line)
       (split ?line :> ?word)
       (c/count ?count)))
; Paul Lam
; github.com/Quantisan/Impatient
WordCount – Cascalog / Clojure
Document
Collection
Word
Count
Tokenize
GroupBy
token Count
R
M

github.com/nathanmarz/cascalog/wiki
• implements Datalog in Clojure, with predicates backed
by Cascading – for a highly declarative language
• run ad-hoc queries from the Clojure REPL –
approx. 10:1 code reduction compared with SQL
• composable subqueries, used for test-driven development
(TDD) practices at scale
• Leiningen build: simple, no surprises, in Clojure itself
• more new deployments than other Cascading DSLs –
Climate Corp is largest use case: 90% Clojure/Cascalog
• has a learning curve, limited number of Clojure developers
• aggregators are the magic, and those take effort to learn
WordCount – Cascalog / Clojure
Document
Collection
Word
Count
Tokenize
GroupBy
token Count
R
M

import com.twitter.scalding._

class WordCount(args : Args) extends Job(args) {
Tsv(args("doc"),
('doc_id, 'text),
skipHeader = true)
.read
.flatMap('text -> 'token) {
text : String => text.split("[ [](),.]")
}
.groupBy('token) { _.size('count) }
.write(Tsv(args("wc"), writeHeader = true))
}
WordCount – Scalding / Scala
Document
Collection
Word
Count
Tokenize
GroupBy
token Count
R
M

github.com/twitter/scalding/wiki
• extends the Scala collections API so that distributed lists
become “pipes” backed by Cascading
• code is compact, easy to understand
• nearly 1:1 between elements of conceptual ﬂow diagram
and function calls
• extensive libraries are available for linear algebra, abstract
algebra, machine learning – e.g., Matrix API, Algebird, etc.
• signiﬁcant investments by Twitter, Etsy, eBay, etc.
• great for data services at scale
• less learning curve than Cascalog
WordCount – Scalding / Scala
Document
Collection
Word
Count
Tokenize
GroupBy
token Count
R
M

A Thought Exercise
Consider that when a company like Caterpillar moves
into data science, they won’t be building the world’s
next search engine or social network
They will be optimizing supply chain, optimizing fuel
costs, automating data feedback loops integrated
into their equipment…
Operations Research –
crunching amazing amounts of data
$50B company, in a $250B market segment
Upcoming: tractors as drones –
guided by complex, distributed data apps

Two Avenues to the App Layer…
scale ➞
complexity➞
Enterprise: must contend with
complexity at scale everyday…
incumbents extend current practices and
infrastructure investments – using J2EE,
ANSI SQL, SAS, etc. – to migrate
workﬂows onto Apache Hadoop while
leveraging existing staff
Start-ups: crave complexity and
scale to become viable…
new ventures move into Enterprise space
to compete using relatively lean staff,
while leveraging sophisticated engineering
practices, e.g., Cascalog and Scalding

Hadoop
Cluster
source
tap
source
tap sink
tap
trap
tap
customer
profile DBsCustomer
Prefs
logs
logs
Logs
Data
Workflow
Cache
Customers
Support
Web
App
Reporting
Analytics
Cubes
sink
tap
Modeling PMML
Lingual – ANSI SQL
• collab with Optiq – industry-proven code base
• ANSI SQL parser/optimizer atop Cascading
ﬂow planner
• JDBC driver to integrate into existing
tools and app servers
• relational catalog over a collection
of unstructured data
• SQL shell prompt to run queries
• enable analysts without retraining
on Hadoop, etc.
• transparency for Support, Ops,
Finance, et al.
a language for queries – not a database,
but ANSI SQL as a DSL for workﬂows

Lingual – CSV data in local ﬁle system
cascading.org/lingual

Lingual – shell prompt, catalog

Lingual – queries

# load the JDBC package
library(RJDBC)

# set up the driver
drv <- JDBC("cascading.lingual.jdbc.Driver",
"~/src/concur/lingual/lingual-local/build/libs/lingual-local-1.0.0-wip-dev-jdbc.jar")

# set up a database connection to a local repository
connection <- dbConnect(drv,
"jdbc:lingual:local;catalog=~/src/concur/lingual/lingual-examples/
tables;schema=EMPLOYEES")

# query the repository: in this case the MySQL sample database (CSV files)
df <- dbGetQuery(connection,
"SELECT * FROM EMPLOYEES.EMPLOYEES WHERE FIRST_NAME = 'Gina'")
head(df)

# use R functions to summarize and visualize part of the data
df$hire_age <- as.integer(as.Date(df$HIRE_DATE) - as.Date(df$BIRTH_DATE)) / 365.25
summary(df$hire_age)
library(ggplot2)
m <- ggplot(df, aes(x=hire_age))
m <- m + ggtitle("Age at hire, people named Gina")
m + geom_histogram(binwidth=1, aes(y=..density.., fill=..count..)) + geom_density()
Lingual – connecting Hadoop and R

> summary(df$hire_age)
Min. 1st Qu. Median Mean 3rd Qu. Max.
20.86 27.89 31.70 31.61 35.01 43.92
Lingual – connecting Hadoop and R

Hadoop
Cluster
source
tap
source
tap sink
tap
trap
tap
customer
profile DBsCustomer
Prefs
logs
logs
Logs
Data
Workflow
Cache
Customers
Support
Web
App
Reporting
Analytics
Cubes
sink
tap
Modeling PMML
Pattern – model scoring
• migrate workloads: SAS,Teradata, etc.,
exporting predictive models as PMML
• great open source tools – R, Weka,
KNIME, Matlab, RapidMiner, etc.
• integrate with other libraries –
Matrix API, etc.
• leverage PMML as another kind
of DSL
cascading.org/pattern

• established XML standard for predictive model markup
• organized by Data Mining Group (DMG), since 1997
http://dmg.org/
• members: IBM, SAS, Visa, NASA, Equifax, Microstrategy,
Microsoft, etc.
• PMML concepts for metadata, ensembles, etc., translate
directly into Cascading tuple ﬂows
“PMML is the leading standard for statistical and data mining models and
supported by over 20 vendors and organizations.With PMML, it is easy
to develop a model on one system using one application and deploy the
model on another system using another application.”
PMML – standard
wikipedia.org/wiki/Predictive_Model_Markup_Language

• Association Rules: AssociationModel element
• Cluster Models: ClusteringModel element
• Decision Trees: TreeModel element
• Naïve Bayes Classiﬁers: NaiveBayesModel element
• Neural Networks: NeuralNetwork element
• Regression: RegressionModel and GeneralRegressionModel elements
• Rulesets: RuleSetModel element
• Sequences: SequenceModel element
• SupportVector Machines: SupportVectorMachineModel element
• Text Models: TextModel element
• Time Series: TimeSeriesModel element
PMML – model coverage
ibm.com/developerworks/industry/library/ind-PMML2/

## train a RandomForest model

f <- as.formula("as.factor(label) ~ .")
fit <- randomForest(f, data_train, ntree=50)

## test the model on the holdout test set

print(fit$importance)
print(fit)

predicted <- predict(fit, data)
data$predicted <- predicted
confuse <- table(pred = predicted, true = data[,1])
print(confuse)

## export predicted labels to TSV

write.table(data, file=paste(dat_folder, "sample.tsv", sep="/"),
quote=FALSE, sep="t", row.names=FALSE)

## export RF model to PMML

saveXML(pmml(fit), file=paste(dat_folder, "sample.rf.xml", sep="/"))
Pattern – create a model in R

public static void main( String[] args ) throws RuntimeException {
String inputPath = args[ 0 ];
String classifyPath = args[ 1 ];
// set up the config properties
Properties properties = new Properties();
AppProps.setApplicationJarClass( properties, Main.class );
HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );
// create source and sink taps
Tap inputTap = new Hfs( new TextDelimited( true, "t" ), inputPath );
Tap classifyTap = new Hfs( new TextDelimited( true, "t" ), classifyPath );
// handle command line options
OptionParser optParser = new OptionParser();
optParser.accepts( "pmml" ).withRequiredArg();
OptionSet options = optParser.parse( args );

// connect the taps, pipes, etc., into a flow
FlowDef flowDef = FlowDef.flowDef().setName( "classify" )
.addSource( "input", inputTap )
.addSink( "classify", classifyTap );

if( options.hasArgument( "pmml" ) ) {
String pmmlPath = (String) options.valuesOf( "pmml" ).get( 0 );
PMMLPlanner pmmlPlanner = new PMMLPlanner()
.setPMMLInput( new File( pmmlPath ) )
.retainOnlyActiveIncomingFields()
.setDefaultPredictedField( new Fields( "predict", Double.class ) ); // default value if missing from the model
flowDef.addAssemblyPlanner( pmmlPlanner );
}

// write a DOT file and run the flow
Flow classifyFlow = flowConnector.connect( flowDef );
classifyFlow.writeDOT( "dot/classify.dot" );
classifyFlow.complete();
}
Pattern – score a model, within an app

Q3 1997: inﬂection point
four independent teams were working toward horizontal
scale-out of workﬂows based on commodity hardware
this effort prepared the way for huge Internet successes
in the 1997 holiday season… AMZN, EBAY, Inktomi
(YHOO Search), then GOOG
MapReduce and the Apache Hadoop open source stack
emerged from this period

RDBMS
Stakeholder
SQL Query
result sets
Excel pivot tables
PowerPoint slide decks
Web App
Customers
transactions
Product
strategy
Engineering
requirements
BI
Analysts
optimized
code
Circa 1996: pre- inﬂection point

RDBMS
Stakeholder
SQL Query
result sets
Excel pivot tables
PowerPoint slide decks
Web App
Customers
transactions
Product
strategy
Engineering
requirements
BI
Analysts
optimized
code
Circa 1996: pre- inﬂection point
“throw it over the wall”

RDBMS
SQL Query
result sets
recommenders
+
classiﬁers
Web Apps
customer
transactions
Algorithmic
Modeling
Logs
event
history
aggregation
dashboards
Product
Engineering
UX
Stakeholder Customers
DW ETL
Middleware
servletsmodels
Circa 2001: post- big ecommerce successes

RDBMS
SQL Query
result sets
recommenders
+
classiﬁers
Web Apps
customer
transactions
Algorithmic
Modeling
Logs
event
history
aggregation
dashboards
Product
Engineering
UX
Stakeholder Customers
DW ETL
Middleware
servletsmodels
Circa 2001: post- big ecommerce successes
“data products”

Workﬂow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
Data Products Customers
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
Use Cases Across Topologies
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere

Workﬂow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere
“optimize topologies”

Amazon
“Early Amazon: Splitting the website” – Greg Linden
glinden.blogspot.com/2006/02/early-amazon-splitting-website.html
eBay
“The eBay Architecture” – Randy Shoup, Dan Pritchett
addsimplicity.com/adding_simplicity_an_engi/2006/11/you_scaled_your.html
addsimplicity.com.nyud.net:8080/downloads/eBaySDForum2006-11-29.pdf
Inktomi (YHOO Search)
“Inktomi’s Wild Ride” – Erik Brewer (0:05:31 ff)
youtu.be/E91oEn1bnXM
Google
“Underneath the Covers at Google” – Jeff Dean (0:06:54 ff)
youtu.be/qsan-GQaeyk
perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx
MIT Media Lab
“Social Information Filtering for Music Recommendation” – Pattie Maes
pubs.media.mit.edu/pubs/papers/32paper.ps
ted.com/speakers/pattie_maes.html
Primary Sources

Cluster Computing’s Dirty Little Secret
many of us make a good living by leveraging high ROI
apps based on clusters, and so execs agree to build
out more data centers…
clusters for Hadoop/HBase, for Storm, for MySQL,
for Memcached, for Cassandra, for Nginx, etc.
this becomes expensive!
a single class of workloads on a given cluster is simpler
to manage, but terrible for utilization… various notions
of “cloud” help…
Cloudera, Hortonworks, probably EMC soon: sell a notion
of “Hadoop as OS” All your workloads are belong to us
Google Data Center, Fox News
~2002

Three Laws, or more?
meanwhile, architectures evolve toward much, much larger data…
pistoncloud.com/ ...
Rich Freitas, IBM Research
Q:
what disruptions in topologies+algorithms could this imply?
given there’s no such thing as RAM anymore…

Three Laws, or more?
meanwhile, architectures evolve toward much, much larger data…
pistoncloud.com/ ...
Rich Freitas, IBM Research
regardless of how architectures change,
death and taxes will endure:
servers fail, data must move
Q:
what disruptions in topologies+algorithms could this imply?
given there’s no such thing as RAM anymore…

The Modern Kernel: Top Linux Contributors…

Beyond Hadoop
Hadoop – an open source solution for fault-tolerant parallel
processing of batch jobs at scale, based on commodity
hardware… however, other priorities have emerged for the
analytics lifecycle:
• apps require integration beyond Hadoop
• multiple topologies, mixed workloads, multi-tenancy
• higher utilization
• lower latency
• highly-available, long running services
• more than “Just JVM” – e.g., Python growth
keep in mind the priority for multi-disciplinary efforts,
to break down even more silos – well beyond the
de facto “priesthood” of data engineering

Beyond Hadoop
Google has been doing data center computing for years,
to address the complexities of large-scale data workﬂows:
• leveraging the modern kernel: isolation in lieu of VMs
• “most (>80%) jobs are batch jobs, but the majority
of resources (55–80%) are allocated to service jobs”
• mixed workloads, multi-tenancy
• relatively high utilization rates
• JVM? not so much…
• reality: scheduling batch is simple;
scheduling services is hard/expensive

“Return of the Borg”
Return of the Borg: HowTwitter Rebuilt Google’s
SecretWeapon
Cade Metz
wired.com/wiredenterprise/
2013/03/google-borg-twitter-mesos
The Datacenter as a Computer: An Introduction
to the Design ofWarehouse-Scale Machines
Luiz André Barroso, Urs Hölzle
research.google.com/pubs/
pub35290.html
2011 GAFS Omega
John Wilkes, et al.
youtu.be/0ZFMlO98Jkc

“Return of the Borg”
Omega: ﬂexible, scalable schedulers for large compute clusters
Malte Schwarzkopf,Andy Konwinski, Michael Abd-El-Malek, John Wilkes
eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf

Mesos – deﬁnitions
a common substrate for cluster computing
heterogenous assets in your data center or cloud
made available as a homogenous set of resources
• top-level Apache project
• scalability to 10,000s of nodes
• obviates the need for virtual machines
• isolation (pluggable) for CPU, RAM, I/O, FS, etc.
• fault-tolerant replicated master using ZooKeeper
• multi-resource scheduling (memory and CPU aware)
• APIs in C++, Java, Python
• web UI for inspecting cluster state
• available for Linux, OpenSolaris, Mac OSX

Mesos – architecture
R
uby
Kernel
Apps
servicesbatch
Frameworks
Python
JVM
C
++
Workloads
distributed ﬁle system
Chronos
DFS
distributed resources: CPU, RAM, I/O, FS, rack locality, etc. Cluster
Storm
Kafka JBoss Django RailsSharkImpalaScalding
Marathon
SparkHadoopMPI
MySQL

Mesos – architecture
given use of Mesos as a Data Center OS kernel…
• Chronos provides complex scheduling capabilities,
much like a distributed Unix “cron”
• Marathon provides highly-available long-running
services, much like a distributed Unix “init.d”
• next time you need to build a distributed app,
consider using these as building blocks
a major lesson learned from Spark:
• leveraging these kinds of building blocks,
one can rebuild Hadoop 100x faster,
in much less code

Mesos – data center OS stack
HADOOP STORM CHRONOS RAILS JBOSS
TELEMETRY
Kernel
OS
Apps
MESOS
CAPACITY PLANNING GUISECURITYSMARTER SCHEDULING

Prior Practice: Dedicated Servers
DATACENTER
• low utilization rates
• longer time to ramp up new services

Prior Practice: Virtualization
DATACENTER PROVISIONED VMS
• even more machines to manage
• substantial performance decrease due to virtualization
• VM licensing costs

Prior Practice: Static Partitioning
DATACENTER STATIC PARTITIONING
• even more machines to manage
• substantial performance decrease due to virtualization
• VM licensing costs
• static partitioning limits elasticity

MESOS
Mesos: One Large Pool Of Resources
DATACENTER
“We wanted people to be able to program
for the data center just like they program
for their laptop."
Ben Hindman

What are the costs of Virtualization?
benchmark
type
OpenVZ
improvement
mixed workloads 210%-300%
LAMP (related) 38%-200%
I/O throughput 200%-500%
response time order magnitude
more pronounced
at higher loads

What are the costs of Single Tenancy?
0%
25%
50%
75%
100%
RAILS CPU
LOAD
MEMCACHED
CPU LOAD
0%
25%
50%
75%
100%
HADOOP CPU
LOAD
0%
25%
50%
75%
100%
t t
0%
25%
50%
75%
100%
Rails
Memcached
Hadoop
COMBINED CPU LOAD (RAILS,
MEMCACHED, HADOOP)

M
Master
Docker
Registry
index.docker.io
Local
Docker
Registry
( optional )
M
M
S
S
S
S
S
S
marathon
docker
docker
docker
Mesos
master servers
Mesos
slave servers
Marathon can launch and monitor
service containers from one or
more Docker registries, using
the Docker executor for Mesos
S
S
S S
S
S
…
…
…
…
…
…
…
mesosphere.io/2013/09/26/docker-on-mesos/
Example: Docker on Mesos

Arguments for Data Center Computing
rather than running several specialized clusters, each
at relatively low utilization rates, instead run many
mixed workloads
obvious beneﬁts are realized in terms of:
• scalability, elasticity, fault tolerance, performance, utilization
• reduced equipment capex, Ops overhead, etc.
• reduced licensing, eliminating need forVMs or
potential vendor lockin
subtle beneﬁts – arguably, more important for Enterprise IT:
• reduced time for engineers to rampup new services at scale
• reduced latency between batch and services, enabling new
highROI use cases
• enables Dev/Test apps to run safely on a Production cluster

Opposite Ends of the Spectrum, One Substrate
Built-in /
bare metal
Hypervisors
Solaris Zones
Linux CGroups

Opposite Ends of the Spectrum, One Substrate
Request /
Response
Batch

Case Study: Twitter (bare metal / on premise)
“Mesos is the cornerstone of our elastic compute infrastructure –
it’s how we build all our new services and is critical forTwitter’s
continued success at scale. It's one of the primary keys to our
data center efﬁciency."
Chris Fry, SVP Engineering
blog.twitter.com/2013/mesos-graduates-from-apache-incubation
• key services run in production: analytics, typeahead, ads
• Twitter engineers rely on Mesos to build all new services
• instead of thinking about static machines, engineers think
about resources like CPU, memory and disk
• allows services to scale and leverage a shared pool of
servers across data centers efﬁciently
• reduces the time between prototyping and launching

Case Study: Airbnb (fungible cloud infrastructure)
“We think we might be pushing data science in the ﬁeld of travel
more so than anyone has ever done before… a smaller number
of engineers can have higher impact through automation on
Mesos."
Mike Curtis,VP Engineering
gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data-driven...
• improves resource management and efﬁciency
• helps advance engineering strategy of building small teams
that can move fast
• key to letting engineers make the most of AWS-based
infrastructure beyond just Hadoop
• allowed company to migrate off Elastic MapReduce
• enables use of Hadoop along with Chronos, Spark, Storm, etc.

Media Coverage
Play Framework Grid Deployment with Mesos
James Ward, Flo Leibert, et al.
Typesafe blog (2013-09-19)
typesafe.com/blog/play-framework-grid...
Mesosphere Launches Marathon Framework
Adrian Bridgwater
Dr. Dobbs (2013-09-18)
drdobbs.com/open-source/mesosphere...
New open source tech Marathon wants to make your data center run like Google’s
Derrick Harris
GigaOM (2013-09-04)
gigaom.com/2013/09/04/...
Running batch and long-running, highly available service jobs on the same cluster
Ben Lorica
O’Reilly (2013-09-01)
strata.oreilly.com/2013/09/...

Resources
Apache Mesos Project
mesos.apache.org
Mesosphere
mesosphere.io
Tutorial
mesosphere.io/2013/08/01/...
Documentation
mesos.apache.org/documentation
2011 USENIX Research Paper
usenix.org/legacy/event/nsdi11/tech/full_papers/Hindman_new.pdf
Collected Notes/Archives
goo.gl/jPtTP

Cluster Computing
with Apache Mesos and Cascading:
1. Enterprise Data Workﬂows
2. Lingual and Pattern Examples
3. An Evolution of Cluster Computing
SUMMARY…
Boulder, 2013-09-25

Workﬂow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere – Four-Part Harmony

Workﬂow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
1. End Use Cases, the drivers

Workﬂow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
2. A new kind of team process

Workﬂow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
3. Abstraction layer as optimizing
middleware, e.g., Cascading

Workﬂow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
4. Data Center OS, e.g., Mesos

Enterprise DataWorkﬂows with Cascading
O’Reilly, 2013
shop.oreilly.com/product/
0636920028536.do
monthly newsletter for updates, events,
conference summaries, etc.:
liber118.com/pxn/

Boulder/Denver BigData 2013: Cluster Computing with Mesos and Cascading

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Boulder/Denver BigData 2013: Cluster Computing with Mesos and Cascading

Similar to Boulder/Denver BigData 2013: Cluster Computing with Mesos and Cascading (20)

More from Paco Nathan

More from Paco Nathan (20)

Recently uploaded

Recently uploaded (20)

Boulder/Denver BigData 2013: Cluster Computing with Mesos and Cascading