O'Reilly Media - Strata SC 2014
Apache Mesos is an open source cluster manager that provides efficient resource isolation for distributed frameworks—similar to Google’s “Borg” and “Omega” projects for warehouse scale computing. It is based on isolation features in the modern kernel: “cgroups” in Linux, “zones” in Solaris.
Google’s “Omega” research paper shows that while 80% of the jobs on a given cluster may be batch (e.g., MapReduce), 55-60% of cluster resources go toward services. The batch jobs on a cluster are the easy part—services are much more complex to schedule efficiently. However by mixing workloads, the overall problem of scheduling resources can be greatly improved.
Given the use of Mesos as the kernel for a “data center OS”, two additional open source components Chronos (like Unix “cron”) and Marathon (like Unix “init.d”) serve as the building blocks for creating distributed, fault-tolerant, highly-available apps at scale.
This talk will examine case studies of Mesos uses in production at scale: ranging from Twitter (100% on prem) to Airbnb (100% cloud), plus MediaCrossing, Categorize, HubSpot, etc. How have these organizations leveraged Mesos to build better, more scalable and efficient distributed apps? Lessons from the Mesos developer community show that one can port an existing framework with a wrapper in approximately 100 line of code. Moreover, an important lesson from Spark is that based on “data center OS” building blocks one can rewrite a distributed system much like Hadoop to be 100x faster within a relatively small amount of source code.
These case studies illustrate the obvious benefits over prior approaches based on virtualization: scalability, elasticity, fault-tolerance, high availability, improved utilization rates, etc. Less obvious outcomes also include: reduced time for engineers to ramp-up new services at scale; reduced latency between batch and services, enabling new high-ROI use cases; and enabling dev/test apps to run on a production cluster without disrupting operations.