A meetup talk/demo (2013-08-07) about the Mesos open source project http://mesos.apache.org/ by Tobi Knaup and Paco Nathan of Mesosphere http://mesosphe.re/ sponsored by GeekAustin http://mesos-austin.eventbrite.com/
How AI, OpenAI, and ChatGPT impact business and software.
GeekAustin: What’s So Exciting About Mesos?
1. Tobi Knaup @superguenter
Paco Nathan @pacoid
“GeekAustin:
What’s So Exciting About Mesos?”
Licensed under a Creative Commons Attribution-
NonCommercial-NoDerivs 3.0 Unported License.
Tuesday, 13 August 13
2. “What’s so exciting about Mesos?”
• What is Apache Mesos?
• Case Studies
• History: How did we get here?
• Screen Shots
• Demo, Q&A
mesos.apache.org
Tuesday, 13 August 13
3. Mesos – definitions
a common substrate for cluster computing
heterogenous assets in your data center or cloud made
available as a homogenous set of resources
• Fault-tolerant replicated master using ZooKeeper
• Scalability to 10,000s of nodes
• Isolation between tasks with Linux Containers
• Multi-resource scheduling (memory and CPU aware)
• Java, Python, and C++ APIs for developing new parallel
applications
• Web UI for viewing cluster state
• Obviates the need for virtual machines
Tuesday, 13 August 13
4. Mesos – background
• Available for Linux, Mac OSX, OpenSolaris
• Developed by UC Berkeley / AMP Lab,Twitter,Airbnb,
Mesosphere, etc.
• Deployments at Twitter,Airbnb, InsideVault,Vimeo,
UCSF, UC Berkeley, etc.
Tuesday, 13 August 13
6. “Return of the Borg”
Return of the Borg: HowTwitter Rebuilt Google’s SecretWeapon
Cade Metz
wired.com/wiredenterprise/2013/03/google-
borg-twitter-mesos
“We wanted people to be able to program
for the data center just like they program
for their laptop."
Ben Hindman
Tuesday, 13 August 13
7. “Return of the Borg”
Consider that Google is generations ahead of
Hadoop, etc., with much improved ROI on its
data centers…
Borg serves as the data center “secret sauce”,
with Omega as its next evolution:
2011 GAFS Omega
John Wilkes, et al.
youtu.be/0ZFMlO98Jkc
Tuesday, 13 August 13
8. Industry Issues:
• Most software developers tend to think about
computing resources in terms of individual hosts
• Clusters are simply considered as collections of
hosts
• Typically, those machines get divided into smaller
virtual machines to allow for fine-grained resource
allocation
• On the one hand, this practice leads to more
complexity, due to the number of systems that
must be managed
• On the other hand, it results in less efficiency: the
hypervisor becomes a black box which the host
operating system cannot schedule intelligently
Tuesday, 13 August 13
9. Mesos – benefits
• scale to 10,000s of nodes using fast, event-driven C++ impl
• maximize utilization rates, minimize latency for data updates
• combine batch, real-time, and long-lived services on the same
nodes and share resources
• reshape clusters on the fly based on app history and workload
requirements
• run multiple Hadoop versions, Spark, MPI, Heroku, HAProxy, etc.,
on the same cluster
• build new distributed frameworks without reinventing low-level
facilities
• enable new kinds of apps, which combine frameworks with
lower latency
• hire top talent out of Google, while providing a familiar data center
environment
Tuesday, 13 August 13
10. STATE OF THE ART
Provision VMs on public cloud or physical servers
DATACENTER
Tuesday, 13 August 13
11. STATE OF THE ART
PROVISIONED VMS
Provision VMs on public cloud or physical servers
DATACENTER
Tuesday, 13 August 13
12. STATE OF THE ART
PROVISIONED VMS
Use Chef/Puppet to setup & launch Hadoop
DATACENTER
Tuesday, 13 August 13
13. STATE OF THE ART
STATICALLY PARTITIONED SERVICES
Use Chef/Puppet to setup & launch Hadoop
DATACENTER
Tuesday, 13 August 13
14. STATE OF THE ART
STATICALLY PARTITIONED SERVICES
Use Chef/Puppet to setup & launch JBoss
DATACENTER
Tuesday, 13 August 13
15. STATE OF THE ART
STATICALLY PARTITIONED SERVICES
Use Chef/Puppet to setup & launch JBoss
DATACENTER
Tuesday, 13 August 13
16. STATE OF THE ART
STATICALLY PARTITIONED SERVICES
Manually resize Hadoop
DATACENTER
Tuesday, 13 August 13
17. STATE OF THE ART
STATICALLY PARTITIONED SERVICES
DATACENTER
Manually resize Hadoop
Tuesday, 13 August 13
18. STATE OF THE ART
STATICALLY PARTITIONED SERVICES
It is difficult to deploy new frameworks (provision, setup, install, resize)
Static partitioning leads to low utilization and prevents elasticity
DATACENTER
Tuesday, 13 August 13
19. ONE LARGE POOL OF RESOURCES
DATACENTER
MESOS
Tuesday, 13 August 13
20. VALUE PROPOSITION - EASY DEVELOPMENT OF APPS
CHRONOS SPARK HADOOP DPARK MPI
JVM (JAVA, SCALA, CLOJURE, JRUBY)
MESOS
PYTHON C++
Tuesday, 13 August 13
21. MESOSPHERE CLOUD OS STACK
HADOOP STORM CHRONOS RAILS JBOSS
TELEMETRY
Kernel
OS
Apps
MESOS
CAPACITY PLANNING GUISECURITYSMARTER SCHEDULING
Tuesday, 13 August 13
22. Example: Balance Utilization Curves
0%
25%
50%
75%
100%
RAILS CPU
LOAD
MEMCACHED
CPU LOAD
0%
25%
50%
75%
100%
HADOOP CPU
LOAD
0%
25%
50%
75%
100%
t t
0%
25%
50%
75%
100%
Rails
Memcached
Hadoop
COMBINED CPU LOAD (RAILS,
MEMCACHED, HADOOP)
Tuesday, 13 August 13
24. “What’s so exciting about Mesos?”
• What is Apache Mesos?
• Case Studies
• History: How did we get here?
• Screen Shots
• Demo, Q&A
mesos.apache.org
Tuesday, 13 August 13
25. Case Study: Twitter (bare metal / on-prem)
“Mesos is the cornerstone of our elastic compute infrastructure –
it’s how we build all our new services and is critical forTwitter’s
continued success at scale. It's one of the primary keys to our
data center efficiency."
Chris Fry, SVP Engineering
blog.twitter.com/2013/mesos-graduates-from-apache-incubation
• several key services run in production: analytics, typeahead, ads, etc.
• engineers rely on Mesos to build all our new services
• instead of thinking about static machines, engineers think about
resources like CPU, memory and disk
• allows services to scale and leverage a shared pool of servers across
data centers efficiently
• reduces the time between prototyping and launching new services
efficiently
Tuesday, 13 August 13
26. Case Study: Airbnb (fungible cloud infra)
“We think we might be pushing data science in the field of travel
more so than anyone has ever done before… a smaller number
of engineers can have higher impact through automation on
Mesos."
Mike Curtis,VP Engineering
gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data-driven-company
• improves resource management and efficiency
• helps advance engineering strategy of building small teams that can
move fast
• key to letting engineers make the most of AWS-based infrastructure
beyond just Hadoop
• allowed Airbnb to migrate off the Elastic MapReduce service
• enables use of Hadoop along with Chronos, Spark, Storm, etc.
Tuesday, 13 August 13
27. TWO WORLDS - ONE SUBSTRATE
Built-in /
bare metal
Hypervisors
Solaris Zones
Linux CGroups
Tuesday, 13 August 13
28. TWO WORLDS - ONE SUBSTRATE
Request /
Response
Batch
Tuesday, 13 August 13
29. “What’s so exciting about Mesos?”
• What is Apache Mesos?
• Case Studies
• History: How did we get here?
• Screen Shots
• Demo, Q&A
mesos.apache.org
Tuesday, 13 August 13
30. Q3 1997: inflection point
Four independent teams were working toward horizontal
scale-out of workflows based on commodity hardware
This effort prepared the way for huge Internet successes
in the 1997 holiday season… AMZN, EBAY, Inktomi
(YHOO Search), then GOOG
MapReduce and the Apache Hadoop open source stack
emerged from this
Tuesday, 13 August 13
31. RDBMS
Stakeholder
SQL Query
result sets
Excel pivot tables
PowerPoint slide decks
Web App
Customers
transactions
Product
strategy
Engineering
requirements
BI
Analysts
optimized
code
Circa 1996: pre- inflection point
Tuesday, 13 August 13
32. RDBMS
Stakeholder
SQL Query
result sets
Excel pivot tables
PowerPoint slide decks
Web App
Customers
transactions
Product
strategy
Engineering
requirements
BI
Analysts
optimized
code
Circa 1996: pre- inflection point
“throw it over the wall”
Tuesday, 13 August 13
33. RDBMS
SQL Query
result sets
recommenders
+
classifiers
Web Apps
customer
transactions
Algorithmic
Modeling
Logs
event
history
aggregation
dashboards
Product
Engineering
UX
Stakeholder Customers
DW ETL
Middleware
servletsmodels
Circa 2001: post- big ecommerce successes
Tuesday, 13 August 13
34. RDBMS
SQL Query
result sets
recommenders
+
classifiers
Web Apps
customer
transactions
Algorithmic
Modeling
Logs
event
history
aggregation
dashboards
Product
Engineering
UX
Stakeholder Customers
DW ETL
Middleware
servletsmodels
Circa 2001: post- big ecommerce successes
“data products”
Tuesday, 13 August 13
35. Workflow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
Data Products Customers
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
Use Cases Across Topologies
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere
Tuesday, 13 August 13
36. Workflow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
Data Products Customers
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
Use Cases Across Topologies
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere
“optimize topologies”
Tuesday, 13 August 13
37. Amazon
“Early Amazon: Splitting the website” – Greg Linden
glinden.blogspot.com/2006/02/early-amazon-splitting-website.html
eBay
“The eBay Architecture” – Randy Shoup, Dan Pritchett
addsimplicity.com/adding_simplicity_an_engi/2006/11/you_scaled_your.html
addsimplicity.com.nyud.net:8080/downloads/eBaySDForum2006-11-29.pdf
Inktomi (YHOO Search)
“Inktomi’s Wild Ride” – Erik Brewer (0:05:31 ff)
youtu.be/E91oEn1bnXM
Google
“Underneath the Covers at Google” – Jeff Dean (0:06:54 ff)
youtu.be/qsan-GQaeyk
perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx
MIT Media Lab
“Social Information Filtering for Music Recommendation” – Pattie Maes
pubs.media.mit.edu/pubs/papers/32paper.ps
ted.com/speakers/pattie_maes.html
Primary Sources
Tuesday, 13 August 13
38. Current Challenge
Consider the datacenter as a computer…
We must rethink the way that we write, deploy, and
manage distributed applications
Early use cases for clustered computing tend to tolerate,
having many separate clusters; however, more mature
Enterprise use cases require ROI, hence higher utilization
rates
Managing the operational costs for large, distributed apps
becomes key
Mesos provides the means for this evolution
Tuesday, 13 August 13
39. “What’s so exciting about Mesos?”
• What is Apache Mesos?
• Case Studies
• History: How did we get here?
• Screen Shots
• Demo, Q&A
mesos.apache.org
Tuesday, 13 August 13
47. “What’s so exciting about Mesos?”
• What is Apache Mesos?
• Case Studies
• History: How did we get here?
• Screen Shots
• Demo, Q&A
mesos.apache.org
Tuesday, 13 August 13