SlideShare a Scribd company logo
1 of 26
Download to read offline
ZFS & Zones:
Your Compute fell into
My Data!
Bryan Cantrill
SVP, Engineering
bryan@joyent.com
@bcantrill
The filesystem: Some prehistory

•

When they were originally developed in the 1970s,
filesystems were designed as an abstraction over a disk

•

Over time, it became increasingly expensive to make
bigger disks — and reliability suffered

•

In the 1980s, both problems were solved by using many
hard-drives instead of just larger and large drives: a
redundant array of inexpensive disks (RAID)

•

Even though filesystems were still relatively young at the
time, it was deemed too complicated to rewrite them to
accommodate the (new) notion of many disks

•

This software problem was solved by introducing a new
layer of software: the volume manager
The volume management divide

•

Volume management abstracts many physical devices
into single logical volumes, allowing filesystems retained
a one-to-one mapping with a device (a logical one)

•

This gave rise to a problematic divide:

•
•

•

The volume manager understands multiple disks, but
nothing of the higher level semantics of the filesystem
The filesystem understands the higher semantics of the
data, but has no physical device understanding

This divide became entrenched over the 1990s, and had
devastating ramifications for reliability, performance and
manageability
Volume management deficiencies

•

Because the volume management layer had no notion
of the transactional semantics of the filesystem, system
failure induced excruciating file system checks

•

Worse, the system was left with no protection against
many variants of device-level data corruption:

•
•

•

The only failure the volume manager can reasonably detect
is media failure that results in incorrect data on disk
This doesn’t account for phantom reads (i.e., the wrong disk
block is read from), phantom writes (i.e., the wrong disk
block is written to) or driver pathologies (e.g. memory errors)

And because they did not understand more than one
device, device failure often meant filesystem failure
Volume management deficiencies

•

Lacking visibility into the hardware layer, the filesystem
could not effectively use the parallelism inherent in
multiple disks — and could not effectively schedule I/O

•

Spindles were underutilized (leaving bandwidth and/or
IOPS on the table) or overutilized (thrashing the device
and yielding pathological performance

•

Management was a nightmare: filesystems could not be
expanded or shrunk — requiring every filesystem to
know in advance its intended capacity
The ZFS revolution

•

Starting in 2001, Sun began a revolutionary new
software effort: to unify storage and eliminate the divide

•

In this model, filesystems would lose their one-to-one
association with devices: many filesystems would be
multiplexed on many devices

•

By starting with a clean sheet of paper, ZFS opened up
vistas of innovation — and by its architecture was able
to solve many otherwise intractable problems

•

Sun shipped ZFS in 2005, and used it as the foundation
of its enterprise storage products starting in 2008

•

ZFS was open sourced in 2005; it remains the only open
source enterprise-grade filesystem
ZFS advantages

•

Copy-on-write design allows on-disk consistency to be
always assured (eliminating file system check)

•

Copy-on-write design allows constant-time snapshots in
unlimited quantity — and writable clones!

•

Filesystem architecture allows filesystems to be created
instantly and expanded — or shrunk! — on-the-fly

•

Integrated volume management allows for intelligent
device behavior with respect to disk failure and recovery

•

Adaptive replacement cache (ARC) allows for optimal
use of DRAM — especially on high DRAM systems

•

Support for dedicated log and cache devices allows for
optimal use of flash-based SSDs
ZFS at Joyent

•

Joyent was the earliest ZFS adopter: becoming (in
2005) the first production user of ZFS outside of Sun

•

ZFS is one of the four foundational technologies of
Joyent’s SmartOS, our illumos derivative

•
•

•

The other three foundational technologies in SmartOS are
DTrace, Zones and KVM
Search “fork yeah illumos” for the (uncensored) history of
OpenSolaris, illumos, SmartOS and derivatives

Joyent has extended ZFS to provide better support
multi-tenant operation with I/O throttling
ZFS as the basis for object storage?

•
•

We view ZFS as our most foundational differentiator...

•

Could we extend ZFS in some important way that would
offer something interesting and compelling?

•

Short answer: meh

As we began to think about building our own internet
facing object store in the fall of 2011, we naturally
gravitated to ZFS...
Aside: Virtualization in the cloud

•

Operating a public cloud has significant technological
and business challenges:

•

From a technological perspective, must deliver highly elastic
infrastructure with acceptable quality of service across a
broad class of users and applications

•

From a business perspective, must drive utilization as high
as possible while still satisfying customer expectations for
quality of service

•

These aspirations are in tension: multi-tenancy can
significantly degrade quality of service

•

The key enabling technology for multi-tenancy is
virtualization — but where in the stack to virtualize?
Hardware-level virtualization?

•

The historical answer — since the 1960s — has been to
virtualize at the level of the hardware:

•

A virtual machine is presented upon which each
tenant runs an operating system of their choosing

•

There are as many operating systems as tenants

•

The historical motivation for hardware virtualization
remains its advantage today: it can run entire legacy
stacks unmodified

•

However, hardware virtualization exacts a heavy tolls:
operating systems are not designed to share resources
like DRAM, CPU, I/O devices or the network

•

Hardware virtualization limits tenancy and inhibits
performance!
Platform-level virtualization?

•

Virtualizing at the application platform layer addresses
the tenancy challenges of hardware virtualization…

•
•

...but at the cost of dictating abstraction to the developer

•

Virtualizing at the application platform layer poses many
other challenges:

This creates the “Google App Engine problem”:
developers are in a straightjacket where toy programs
are easy — but sophisticated apps are impossible

•

Security, resource containment, language specificity,
environment-specific engineering costs
Joyent’s solution: OS-level virtualization

•

Virtualizing at the OS level hits the sweet spot:

•

Single OS (single kernel) allows for efficient use of hardware
resources, and therefore allows load factors to be high

•

Disjoint instances are securely compartmentalized by the
operating system

•

Gives customers what appears to be a virtual machine
(albeit a very fast one) on which to run higher-level software

•

Gives customers PaaS when the abstractions work for them,
IaaS when they need more generality

•

OS-level virtualization allows for high levels of tenancy
without dictating abstraction or sacrificing efficiency

•

Zones is a bullet-proof implementation of OS-level
virtualization — and is the core abstraction in Joyent’s
SmartOS
Idea: ZFS + Zones?
Manta: ZFS + Zones!

•

Building a sophisticated distributed system on top of
ZFS and zones, we have built Manta, an internet-facing
object storage system offering in situ compute

•

That is, the description of compute can be brought to
where objects reside instead of having to backhaul
objects to transient compute

•

The abstractions made available for computation are
anything that can run on the OS...

•

...and as a reminder, the OS — Unix — was built around
the notion of ad hoc unstructured data processing, and
allows for remarkably terse expressions of computation
Aside: Unix

•

When Unix appeared in the early 1970s, it was not just a
new system, but a new way of thinking about systems

•

Instead of a sealed monolith, the operating system was
a collection of small, easily understood programs

•

First Edition Unix (1971) contained many programs that
we still use today (ls, rm, cat, mv)

•

Its very name conveyed this minimalist aesthetic: Unix is
a homophone of “eunuchs” — a castrated Multics
We were a bit oppressed by the big system mentality. Ken
wanted to do something simple. — Dennis Ritchie
Unix: Let there be light

•

In 1969, Doug McIlroy had the idea of connecting
different components:
At the same time that Thompson and Ritchie were sketching
out a file system, I was sketching out how to do data
processing on the blackboard by connecting together
cascades of processes

•

This was the primordial pipe, but it took three years to
persuade Thompson to adopt it:
And one day I came up with a syntax for the shell that went
along with the piping, and Ken said, “I’m going to do it!”
Unix: ...and there was light

And the next morning we had this
orgy of one-liners. — Doug McIlroy
The Unix philosophy

•

The pipe — coupled with the small-system aesthetic —
gave rise to the Unix philosophy, as articulated by Doug
McIlroy:

•
•

Write programs to work together

•
•

Write programs that do one thing and do it well

Write programs that handle text streams, because
that is a universal interface

Four decades later, this philosophy remains the single
most important revolution in software systems thinking!
Doug McIlroy v. Don Knuth: FIGHT!

•

In 1986, Jon Bentley posed the challenge that became
the Epic Rap Battle of computer science history:
Read a file of text, determine the n most frequently used
words, and print out a sorted list of those words along with
their frequencies.

•

Don Knuth’s solution: an elaborate program in WEB, a
Pascal-like literate programming system of his own
invention, using a purpose-built algorithm

•

Doug McIlroy’s solution shows the power of the Unix
philosophy:
tr -cs A-Za-z 'n' | tr A-Z a-z | 
sort | uniq -c | sort -rn | sed ${1}q
Big Data: History repeats itself?

•

The original Google MapReduce paper (Dean et al.,
OSDI ’04) poses a problem disturbingly similar to
Bentley’s challenge nearly two decades prior:
Count of URL Access Frequency: The function processes
logs of web page requests and outputs ⟨URL, 1⟩. The
reduce function adds together all values for the same URL
and emits a ⟨URL, total count⟩ pair

•
•

But the solutions do not adhere to the Unix philosophy...

•

e.g., Appendix A of the OSDI ’04 paper has a 71 line
word count in C++ — with nary a wc in sight

...and nor do they make use of the substantial Unix
foundation for data processing
Manta: Unix for Big Data

•

Manta allows for an arbitrarily scalable variant of
McIlroy’s solution to Bentley’s challenge:
mfind -t o /bcantrill/public/v7/usr/man | 
mjob create -o -m "tr -cs A-Za-z 'n' | 
tr A-Z a-z | sort | uniq -c" -r 
"awk '{ x[$2] += $1 }
END { for (w in x) { print x[w] " " w } }' | 
sort -rn | sed ${1}q"

•

This description not only terse, it is high performing: data
is left at rest — with the “map” phase doing heavy
reduction of the data stream

•

As such, Manta — like Unix — is not merely syntactic
sugar; it converges compute and data in a new way
Manta: CAP tradeoffs

•

Eventual consistency represents the wrong CAP
tradeoffs for most; we prefer consistency over
availability for writes (but still availability for reads)

•

Many more details:
http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/

•

Celebrity endorsement:
Manta: Other design principles

•

Hierarchical storage is an excellent idea (ht: Multics);
Manta implements proper directories, delimited with a
forward slash

•

Manta implements a snapshot/link hybrid dubbed a
snaplink; can be used to effect versioning

•
•

Manta has full support for CORS headers

•
•

Manta SDKs exist for node.js, Java, Ruby, Python

Manta uses SSH-based HTTP auth for client-side
tooling (IETF draft-cavage-http-signatures-00)

“npm install manta” for command line interface
Manta and the future of big data

•

We believe compute/data convergence to be the future
of big data: stores of record must support computation
as a first-class, in situ operation

•

We believe that Unix is a natural way of expressing this
computation — and that the OS is the right level at
which to virtualize to support this securely

•

We believe that ZFS is the only sane storage
underpinning for such a system

•

Manta will surely not be the only system to represent the
confluence of these — but it is the first

•

We are actively retooling our software stack in terms of
Manta — Manta is changing the way we develop
software!
Manta: More information

•

Product page:
http://joyent.com/products/manta

•

node.js module:
https://github.com/joyent/node-manta

•

Manta documentation:
http://apidocs.joyent.com/manta/

•

IRC, e-mail, Twitter, etc.:
#manta on freenode, manta@joyent.com, @mcavage,
@dapsays, @yunongx, @joyent

More Related Content

Similar to ZFS & Zones: A Powerful Combination for Object Storage and In-Situ Compute

The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to ContainersThe Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containersbcantrill
 
Introducing Plan9 from Bell Labs
Introducing Plan9 from Bell LabsIntroducing Plan9 from Bell Labs
Introducing Plan9 from Bell LabsAnant Narayanan
 
The DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps PlaybookThe DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps Playbookbcantrill
 
unixoperatingsystem-130327073532-phpapp01.pdf
unixoperatingsystem-130327073532-phpapp01.pdfunixoperatingsystem-130327073532-phpapp01.pdf
unixoperatingsystem-130327073532-phpapp01.pdfIxtiyorTeshaboyev
 
Presentation on o s for bca iv
Presentation on o s for bca ivPresentation on o s for bca iv
Presentation on o s for bca ivAjit Singh
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?bcantrill
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadebcantrill
 
Operating Systems As a Product
Operating Systems As a ProductOperating Systems As a Product
Operating Systems As a ProductHarshit Srivastava
 
node.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontiernode.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontierbcantrill
 
operating system (OS) Presentation ).pptx
operating system (OS) Presentation ).pptxoperating system (OS) Presentation ).pptx
operating system (OS) Presentation ).pptxFazalAkbar26
 
Unix++: Plan 9 from Bell Labs
Unix++: Plan 9 from Bell LabsUnix++: Plan 9 from Bell Labs
Unix++: Plan 9 from Bell LabsAnant Narayanan
 
Présentation d'Unikernel
Présentation d'UnikernelPrésentation d'Unikernel
Présentation d'UnikernelProto204
 
Next Generation Cloud: Rise of the Unikernel V3 (UPDATED)
Next Generation Cloud: Rise of the Unikernel V3 (UPDATED)Next Generation Cloud: Rise of the Unikernel V3 (UPDATED)
Next Generation Cloud: Rise of the Unikernel V3 (UPDATED)The Linux Foundation
 
Unikernel User Summit 2015: The Next Generation Cloud: Unleashing the Power o...
Unikernel User Summit 2015: The Next Generation Cloud: Unleashing the Power o...Unikernel User Summit 2015: The Next Generation Cloud: Unleashing the Power o...
Unikernel User Summit 2015: The Next Generation Cloud: Unleashing the Power o...The Linux Foundation
 
CPOSC2014: Next Generation Cloud -- Rise of the Unikernel
CPOSC2014: Next Generation Cloud -- Rise of the UnikernelCPOSC2014: Next Generation Cloud -- Rise of the Unikernel
CPOSC2014: Next Generation Cloud -- Rise of the UnikernelThe Linux Foundation
 
SCALE13x: Next Generation of the Cloud - Rise of the Unikernel
SCALE13x: Next Generation of the Cloud - Rise of the UnikernelSCALE13x: Next Generation of the Cloud - Rise of the Unikernel
SCALE13x: Next Generation of the Cloud - Rise of the UnikernelThe Linux Foundation
 
Introduction to Operating system CBSE
Introduction to Operating system CBSE Introduction to Operating system CBSE
Introduction to Operating system CBSE PrashantChahal3
 

Similar to ZFS & Zones: A Powerful Combination for Object Storage and In-Situ Compute (20)

The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to ContainersThe Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
 
Introducing Plan9 from Bell Labs
Introducing Plan9 from Bell LabsIntroducing Plan9 from Bell Labs
Introducing Plan9 from Bell Labs
 
Plan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIXPlan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIX
 
The DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps PlaybookThe DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps Playbook
 
unixoperatingsystem-130327073532-phpapp01.pdf
unixoperatingsystem-130327073532-phpapp01.pdfunixoperatingsystem-130327073532-phpapp01.pdf
unixoperatingsystem-130327073532-phpapp01.pdf
 
Presentation on o s for bca iv
Presentation on o s for bca ivPresentation on o s for bca iv
Presentation on o s for bca iv
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decade
 
Operating Systems As a Product
Operating Systems As a ProductOperating Systems As a Product
Operating Systems As a Product
 
node.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontiernode.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontier
 
os_1.pdf
os_1.pdfos_1.pdf
os_1.pdf
 
operating system (OS) Presentation ).pptx
operating system (OS) Presentation ).pptxoperating system (OS) Presentation ).pptx
operating system (OS) Presentation ).pptx
 
Unix++: Plan 9 from Bell Labs
Unix++: Plan 9 from Bell LabsUnix++: Plan 9 from Bell Labs
Unix++: Plan 9 from Bell Labs
 
Présentation d'Unikernel
Présentation d'UnikernelPrésentation d'Unikernel
Présentation d'Unikernel
 
Next Generation Cloud: Rise of the Unikernel V3 (UPDATED)
Next Generation Cloud: Rise of the Unikernel V3 (UPDATED)Next Generation Cloud: Rise of the Unikernel V3 (UPDATED)
Next Generation Cloud: Rise of the Unikernel V3 (UPDATED)
 
Unikernel User Summit 2015: The Next Generation Cloud: Unleashing the Power o...
Unikernel User Summit 2015: The Next Generation Cloud: Unleashing the Power o...Unikernel User Summit 2015: The Next Generation Cloud: Unleashing the Power o...
Unikernel User Summit 2015: The Next Generation Cloud: Unleashing the Power o...
 
CPOSC2014: Next Generation Cloud -- Rise of the Unikernel
CPOSC2014: Next Generation Cloud -- Rise of the UnikernelCPOSC2014: Next Generation Cloud -- Rise of the Unikernel
CPOSC2014: Next Generation Cloud -- Rise of the Unikernel
 
SCALE13x: Next Generation of the Cloud - Rise of the Unikernel
SCALE13x: Next Generation of the Cloud - Rise of the UnikernelSCALE13x: Next Generation of the Cloud - Rise of the Unikernel
SCALE13x: Next Generation of the Cloud - Rise of the Unikernel
 
Introduction to Operating system CBSE
Introduction to Operating system CBSE Introduction to Operating system CBSE
Introduction to Operating system CBSE
 
Microkernel Evolution
Microkernel EvolutionMicrokernel Evolution
Microkernel Evolution
 

More from Hakka Labs

Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Hakka Labs
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchHakka Labs
 
DataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data ScienceDataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data ScienceHakka Labs
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartHakka Labs
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleHakka Labs
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataHakka Labs
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale Hakka Labs
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQHakka Labs
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...Hakka Labs
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...Hakka Labs
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestHakka Labs
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringHakka Labs
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresHakka Labs
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkHakka Labs
 
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesHakka Labs
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityHakka Labs
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...Hakka Labs
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInHakka Labs
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopHakka Labs
 

More from Hakka Labs (20)

Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
DataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data ScienceDataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data Science
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scale
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data Structures
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
 
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with Ourselves
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

ZFS & Zones: A Powerful Combination for Object Storage and In-Situ Compute

  • 1. ZFS & Zones: Your Compute fell into My Data! Bryan Cantrill SVP, Engineering bryan@joyent.com @bcantrill
  • 2. The filesystem: Some prehistory • When they were originally developed in the 1970s, filesystems were designed as an abstraction over a disk • Over time, it became increasingly expensive to make bigger disks — and reliability suffered • In the 1980s, both problems were solved by using many hard-drives instead of just larger and large drives: a redundant array of inexpensive disks (RAID) • Even though filesystems were still relatively young at the time, it was deemed too complicated to rewrite them to accommodate the (new) notion of many disks • This software problem was solved by introducing a new layer of software: the volume manager
  • 3. The volume management divide • Volume management abstracts many physical devices into single logical volumes, allowing filesystems retained a one-to-one mapping with a device (a logical one) • This gave rise to a problematic divide: • • • The volume manager understands multiple disks, but nothing of the higher level semantics of the filesystem The filesystem understands the higher semantics of the data, but has no physical device understanding This divide became entrenched over the 1990s, and had devastating ramifications for reliability, performance and manageability
  • 4. Volume management deficiencies • Because the volume management layer had no notion of the transactional semantics of the filesystem, system failure induced excruciating file system checks • Worse, the system was left with no protection against many variants of device-level data corruption: • • • The only failure the volume manager can reasonably detect is media failure that results in incorrect data on disk This doesn’t account for phantom reads (i.e., the wrong disk block is read from), phantom writes (i.e., the wrong disk block is written to) or driver pathologies (e.g. memory errors) And because they did not understand more than one device, device failure often meant filesystem failure
  • 5. Volume management deficiencies • Lacking visibility into the hardware layer, the filesystem could not effectively use the parallelism inherent in multiple disks — and could not effectively schedule I/O • Spindles were underutilized (leaving bandwidth and/or IOPS on the table) or overutilized (thrashing the device and yielding pathological performance • Management was a nightmare: filesystems could not be expanded or shrunk — requiring every filesystem to know in advance its intended capacity
  • 6. The ZFS revolution • Starting in 2001, Sun began a revolutionary new software effort: to unify storage and eliminate the divide • In this model, filesystems would lose their one-to-one association with devices: many filesystems would be multiplexed on many devices • By starting with a clean sheet of paper, ZFS opened up vistas of innovation — and by its architecture was able to solve many otherwise intractable problems • Sun shipped ZFS in 2005, and used it as the foundation of its enterprise storage products starting in 2008 • ZFS was open sourced in 2005; it remains the only open source enterprise-grade filesystem
  • 7. ZFS advantages • Copy-on-write design allows on-disk consistency to be always assured (eliminating file system check) • Copy-on-write design allows constant-time snapshots in unlimited quantity — and writable clones! • Filesystem architecture allows filesystems to be created instantly and expanded — or shrunk! — on-the-fly • Integrated volume management allows for intelligent device behavior with respect to disk failure and recovery • Adaptive replacement cache (ARC) allows for optimal use of DRAM — especially on high DRAM systems • Support for dedicated log and cache devices allows for optimal use of flash-based SSDs
  • 8. ZFS at Joyent • Joyent was the earliest ZFS adopter: becoming (in 2005) the first production user of ZFS outside of Sun • ZFS is one of the four foundational technologies of Joyent’s SmartOS, our illumos derivative • • • The other three foundational technologies in SmartOS are DTrace, Zones and KVM Search “fork yeah illumos” for the (uncensored) history of OpenSolaris, illumos, SmartOS and derivatives Joyent has extended ZFS to provide better support multi-tenant operation with I/O throttling
  • 9. ZFS as the basis for object storage? • • We view ZFS as our most foundational differentiator... • Could we extend ZFS in some important way that would offer something interesting and compelling? • Short answer: meh As we began to think about building our own internet facing object store in the fall of 2011, we naturally gravitated to ZFS...
  • 10. Aside: Virtualization in the cloud • Operating a public cloud has significant technological and business challenges: • From a technological perspective, must deliver highly elastic infrastructure with acceptable quality of service across a broad class of users and applications • From a business perspective, must drive utilization as high as possible while still satisfying customer expectations for quality of service • These aspirations are in tension: multi-tenancy can significantly degrade quality of service • The key enabling technology for multi-tenancy is virtualization — but where in the stack to virtualize?
  • 11. Hardware-level virtualization? • The historical answer — since the 1960s — has been to virtualize at the level of the hardware: • A virtual machine is presented upon which each tenant runs an operating system of their choosing • There are as many operating systems as tenants • The historical motivation for hardware virtualization remains its advantage today: it can run entire legacy stacks unmodified • However, hardware virtualization exacts a heavy tolls: operating systems are not designed to share resources like DRAM, CPU, I/O devices or the network • Hardware virtualization limits tenancy and inhibits performance!
  • 12. Platform-level virtualization? • Virtualizing at the application platform layer addresses the tenancy challenges of hardware virtualization… • • ...but at the cost of dictating abstraction to the developer • Virtualizing at the application platform layer poses many other challenges: This creates the “Google App Engine problem”: developers are in a straightjacket where toy programs are easy — but sophisticated apps are impossible • Security, resource containment, language specificity, environment-specific engineering costs
  • 13. Joyent’s solution: OS-level virtualization • Virtualizing at the OS level hits the sweet spot: • Single OS (single kernel) allows for efficient use of hardware resources, and therefore allows load factors to be high • Disjoint instances are securely compartmentalized by the operating system • Gives customers what appears to be a virtual machine (albeit a very fast one) on which to run higher-level software • Gives customers PaaS when the abstractions work for them, IaaS when they need more generality • OS-level virtualization allows for high levels of tenancy without dictating abstraction or sacrificing efficiency • Zones is a bullet-proof implementation of OS-level virtualization — and is the core abstraction in Joyent’s SmartOS
  • 14. Idea: ZFS + Zones?
  • 15. Manta: ZFS + Zones! • Building a sophisticated distributed system on top of ZFS and zones, we have built Manta, an internet-facing object storage system offering in situ compute • That is, the description of compute can be brought to where objects reside instead of having to backhaul objects to transient compute • The abstractions made available for computation are anything that can run on the OS... • ...and as a reminder, the OS — Unix — was built around the notion of ad hoc unstructured data processing, and allows for remarkably terse expressions of computation
  • 16. Aside: Unix • When Unix appeared in the early 1970s, it was not just a new system, but a new way of thinking about systems • Instead of a sealed monolith, the operating system was a collection of small, easily understood programs • First Edition Unix (1971) contained many programs that we still use today (ls, rm, cat, mv) • Its very name conveyed this minimalist aesthetic: Unix is a homophone of “eunuchs” — a castrated Multics We were a bit oppressed by the big system mentality. Ken wanted to do something simple. — Dennis Ritchie
  • 17. Unix: Let there be light • In 1969, Doug McIlroy had the idea of connecting different components: At the same time that Thompson and Ritchie were sketching out a file system, I was sketching out how to do data processing on the blackboard by connecting together cascades of processes • This was the primordial pipe, but it took three years to persuade Thompson to adopt it: And one day I came up with a syntax for the shell that went along with the piping, and Ken said, “I’m going to do it!”
  • 18. Unix: ...and there was light And the next morning we had this orgy of one-liners. — Doug McIlroy
  • 19. The Unix philosophy • The pipe — coupled with the small-system aesthetic — gave rise to the Unix philosophy, as articulated by Doug McIlroy: • • Write programs to work together • • Write programs that do one thing and do it well Write programs that handle text streams, because that is a universal interface Four decades later, this philosophy remains the single most important revolution in software systems thinking!
  • 20. Doug McIlroy v. Don Knuth: FIGHT! • In 1986, Jon Bentley posed the challenge that became the Epic Rap Battle of computer science history: Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies. • Don Knuth’s solution: an elaborate program in WEB, a Pascal-like literate programming system of his own invention, using a purpose-built algorithm • Doug McIlroy’s solution shows the power of the Unix philosophy: tr -cs A-Za-z 'n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${1}q
  • 21. Big Data: History repeats itself? • The original Google MapReduce paper (Dean et al., OSDI ’04) poses a problem disturbingly similar to Bentley’s challenge nearly two decades prior: Count of URL Access Frequency: The function processes logs of web page requests and outputs ⟨URL, 1⟩. The reduce function adds together all values for the same URL and emits a ⟨URL, total count⟩ pair • • But the solutions do not adhere to the Unix philosophy... • e.g., Appendix A of the OSDI ’04 paper has a 71 line word count in C++ — with nary a wc in sight ...and nor do they make use of the substantial Unix foundation for data processing
  • 22. Manta: Unix for Big Data • Manta allows for an arbitrarily scalable variant of McIlroy’s solution to Bentley’s challenge: mfind -t o /bcantrill/public/v7/usr/man | mjob create -o -m "tr -cs A-Za-z 'n' | tr A-Z a-z | sort | uniq -c" -r "awk '{ x[$2] += $1 } END { for (w in x) { print x[w] " " w } }' | sort -rn | sed ${1}q" • This description not only terse, it is high performing: data is left at rest — with the “map” phase doing heavy reduction of the data stream • As such, Manta — like Unix — is not merely syntactic sugar; it converges compute and data in a new way
  • 23. Manta: CAP tradeoffs • Eventual consistency represents the wrong CAP tradeoffs for most; we prefer consistency over availability for writes (but still availability for reads) • Many more details: http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/ • Celebrity endorsement:
  • 24. Manta: Other design principles • Hierarchical storage is an excellent idea (ht: Multics); Manta implements proper directories, delimited with a forward slash • Manta implements a snapshot/link hybrid dubbed a snaplink; can be used to effect versioning • • Manta has full support for CORS headers • • Manta SDKs exist for node.js, Java, Ruby, Python Manta uses SSH-based HTTP auth for client-side tooling (IETF draft-cavage-http-signatures-00) “npm install manta” for command line interface
  • 25. Manta and the future of big data • We believe compute/data convergence to be the future of big data: stores of record must support computation as a first-class, in situ operation • We believe that Unix is a natural way of expressing this computation — and that the OS is the right level at which to virtualize to support this securely • We believe that ZFS is the only sane storage underpinning for such a system • Manta will surely not be the only system to represent the confluence of these — but it is the first • We are actively retooling our software stack in terms of Manta — Manta is changing the way we develop software!
  • 26. Manta: More information • Product page: http://joyent.com/products/manta • node.js module: https://github.com/joyent/node-manta • Manta documentation: http://apidocs.joyent.com/manta/ • IRC, e-mail, Twitter, etc.: #manta on freenode, manta@joyent.com, @mcavage, @dapsays, @yunongx, @joyent