SlideShare a Scribd company logo
1 of 37
Download to read offline
Big Data and Containers
Charles Smith
@charles_s_smith
Netflix / Lead the big data platform architecture team
Spend my time / Thinking how to make it easy/efficient to work with big data
University of Florida / PhD in Computer Science
Who am I?
“It is important that we know where we come from, because
if you do not know where you come from, then you don't
know where you are, and if you don't know where you are,
you don't know where you're going. And if you don't know
where you're going, you're probably going wrong.”
Terry Pratchett
Database Distributed Database Distributed Storage
Distributed Processing
???
Why do we care about containers?
Containers ~= Virtual Machines
Virtual Machines ~= Servers
Lightweight
fast to start
memory use
Secure
Process isolation
Data isolation
Portable
Composable
Reproducible
Everything old is new
Microservices and large architectures
Datastorage
(Cassandra, MySQL, MongoDB, etc..)
Operational
(Mesos, Kubernetes, etc...)
Discovery/Routing
What’s different about big data.
Data at rest
Data in motion
Customer Facing
Minimize latency
Maximize reliability
Data Analytics
Minimize I/O
Maximize processing
Ship computation to data
The questions you can answer aren’t predefined
Hive/Pig/MR
Presto
Metacat
Hive
Metastore
That doesn’t look very container-y
(or microservicy-y for that matter)
Datastorage - HDFS (Or in our case S3)
Operational - YARN
Containers - JVM
So what happens when you want to do something else?
But is that really the way we want to approach containers?
What’s different about big data.
Running many different short-lived processes
Running many different short-lived processes
Efficient container construction, allocation, and movement
Groups of processes having meaning
Groups of processes having meaning
How we observe processes needs to be holistic
Processes need to be scheduled by data locality
(And not just data locality for data at rest)
Processes need to be scheduled by data locality
(And not just data locality for data at rest)
A special case of affinity (although possibly over time)
but...
We do need a data discovery service.
(kind of… maybe… a namenode?)
SELECT
t.title_id,
t.title_desc,
SUM(v.view_secs)
FROM
view_history as v
join title_d as t on
v.title_id =
t.title_id
WHERE
v.view_dateint > 20150101
GROUP BY 1,2;
LOAD LOAD
JOIN
GROUP
Data
Discovery
Query Compiler
Query Planner
Metadata
DAG
Watcher
Bottom line
Containers provide process level security
The goal should be to minimize monoliths
This isn’t different from what we are doing already
Our languages are abstractions of composable-distributed processing
Different big data projects should share services
No matter what we do, joining is going to be a big problem
Questions?

More Related Content

Similar to Big data and containers

Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and TypesAnjani Phuyal
 
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Pavlo Baron
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investmentvijayk23x
 
zenoh: The Edge Data Fabric
zenoh: The Edge Data Fabriczenoh: The Edge Data Fabric
zenoh: The Edge Data FabricAngelo Corsaro
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowMigrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowAnant Corporation
 
Moving from a Relational Database to Cassandra: Why, Where, When, and How
Moving from a Relational Database to Cassandra: Why, Where, When, and HowMoving from a Relational Database to Cassandra: Why, Where, When, and How
Moving from a Relational Database to Cassandra: Why, Where, When, and HowAnant Corporation
 
Gerenciando recursos computacionais com Apache Mesos
Gerenciando recursos computacionais com Apache MesosGerenciando recursos computacionais com Apache Mesos
Gerenciando recursos computacionais com Apache Mesostdc-globalcode
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta DataDigikrit
 
Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)Denodo
 
Overview of databases
Overview of databasesOverview of databases
Overview of databasesshaik faroq
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big DataNick Boucart
 
Big data presentation
Big data presentationBig data presentation
Big data presentationChinh Vo Wili
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Big Data DC - Analytics at Clearspring
Big Data DC - Analytics at ClearspringBig Data DC - Analytics at Clearspring
Big Data DC - Analytics at Clearspringabramsm
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudRightScale
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...Niraj Tolia
 

Similar to Big data and containers (20)

Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and Types
 
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)
 
Big Data
Big DataBig Data
Big Data
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investment
 
zenoh: The Edge Data Fabric
zenoh: The Edge Data Fabriczenoh: The Edge Data Fabric
zenoh: The Edge Data Fabric
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowMigrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and How
 
Moving from a Relational Database to Cassandra: Why, Where, When, and How
Moving from a Relational Database to Cassandra: Why, Where, When, and HowMoving from a Relational Database to Cassandra: Why, Where, When, and How
Moving from a Relational Database to Cassandra: Why, Where, When, and How
 
Gerenciando recursos computacionais com Apache Mesos
Gerenciando recursos computacionais com Apache MesosGerenciando recursos computacionais com Apache Mesos
Gerenciando recursos computacionais com Apache Mesos
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
Data mining
Data miningData mining
Data mining
 
Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)
 
Overview of databases
Overview of databasesOverview of databases
Overview of databases
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big Data
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Big Data DC - Analytics at Clearspring
Big Data DC - Analytics at ClearspringBig Data DC - Analytics at Clearspring
Big Data DC - Analytics at Clearspring
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the Cloud
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 

Recently uploaded

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 

Recently uploaded (16)

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 

Big data and containers

Editor's Notes

  1. This is a good thing!
  2. Something that is ingrained at Netflix
  3. Decentralized
  4. Basically do I deploy and get resources?
  5. Think of it this way: Our content is data at rest, a bunch of encodings sitting on an open connect server somewhere. When someone wants to view something, the data is streamed to them, data in motion (for a huge chunk of the downstream bandwidth) And the actual viewing of the content is the visualization of the data. You can extend this pattern to other services. Don’t go overboard, but it is a useful way to think about data. Especially when the data starts to get big.
  6. But that isn’t really what we do.
  7. As a result the allocations need to be fast and scalable.
  8. As a result the allocations need to be fast and scalable.