SlideShare a Scribd company logo
1 of 40
Joshua%Bloom,%Ph.D.%%
CTO,%Co'founder
PyData,'Sea*le,'July'2015
A"Systems"View"of"Machine"Learning"
in%science%&%industry
Gordon%&%Be6y%%
Moore%Founda:on%%
Data'Driven%Inves:gator%%
UC%Berkeley,%Astronomy
@pro6sb
About Me…
http://research.google.com/pubs/pub43146.html
• Complex models erode abstraction
boundaries
• Data dependencies cost more than
code dependencies
• System-level Spaghetti
• Changing External World
“It may be surprising to the
academic community to know
that only a fraction of the code …
is actually doing ‘machine
learning’. A mature system might
end up being (at most)
5% machine learning code and
(at least) 95% glue code.”
Algorithms
Software
Hardware
Project Staff
Consumers
Organization + Society
ML System Components
Agenda
- inside-out discussion of
component parts & some
interconnects
- presentation of some
facilitating new tools
- impact on problem definetion
teams
Linear/
Logistic
Regression
Naive
Bayes
Decision
Trees
SVMs
Bagging
Boosting
Decision
Forests
Neural
Nets
Deep
Learning
Nearest
Neighbors
Gaussian/
Dirichlet
Processes
Splines
Lasso
XGBoost
….
Some Algos/Models/Approaches
Used in Practice
LDA/LSI
RNN
Software Instantiations
in the Python Ecosystem
BOW
word2vec
Nguyen'et'al,'CVPR'2015'
All Models of Learning Have Flaws
http://hunch.net/?p=224
“It’s common to forget the flaws
of the model that you are most
familiar…while the flaws of new
models get exaggerated.”
- John Langford (2007, Microsoft research)
Concepts$≠$Statistics
Convolutional#
networks#
can#be#
fooled.
Nguyen'et'al,'CVPR'2015'
The$impact$of$dataset$bias
Training/testing#on#biased#datasets#gives#unrealistic#results.
!E.g.#:#Torralba and#Efros,#Unbiased2look2at2dataset2bias,#CVPR#2011.
Torralba/Efros11 via L. Bottou (ICML 2015)
All Models of Learning Have Flaws
http://hunch.net/?p=224
“It’s common to forget the flaws
of the model that you are most
familiar…while the flaws of new
models get exaggerated.”
- John Langford (2007, Microsoft research)
Concepts$≠$Statistics
Convolutional#
networks#
can#be#
fooled.
(Nguyen#et#al,#CVPR#2015)
Magri*e,'ICML,'1929'
What are you optimizing for?Component What
Algorithm/Model
Learning rate, convexity, error
bounds, scaling, …
+ Software/Hardware
Accuracy, Memory usage,
Disk usage, CPU needs, time
to learn, time to predict
+ Project Staff
time to implement, people/
resource costs, reliability,
maintainability,
experimentability
+ Consumers
direct value, useability,
explainability, actionability
+ Society indirect value
- multi-axis optimizations in a given
component
- highly coupled optimization
considerations between components
- myoptic view can be costly further
up the stack
Scalar proxies:
- RMSE
- RMSLE
- [adjusted] R2
- ...
R2=0.91
RMSE = 692.3
Pearson R=0.96
Optimization Metric:
What’s the essence of what I care about?
Scalar proxies:
- RMSE
- RMSLE
- [adjusted] R2
- ...
R2=0.91
RMSE = 692.3
Pearson R=0.96
scatter
outliers
bias
Optimization Metric:
What’s the essence of what I care about?
which classifier is best?
depends...
Optimization Metric:
What’s the essence of what I care about?
10
>$50k Prize
<$50k Prize
Netflix
winning'
metric
best'
benchmark
many'teams'get'within'
~few'%'of'opQmum
so"which"is"easier"to"put"
into"produc9on?
Leaderboard'data'from'Kaggle'&'NeMlix
Optimization Metric
11
“We evaluated some of the new methods
offline but the additional accuracy gains
that we measured did not seem to justify the
engineering effort needed to bring them into
a production environment.”
Xavier'Amatriain'and'Jus0n'Basilico'(April'2012)
On the Prize
WiseFactory
automated feature extraction, learning, prediction, deployment
WiseTransfer
efficient manipulation of large objects
WiseDataSet
WiseML
high-productivity data science in Python
WiseAlgorithm
WindTunnel
detect drift in CPU, Mem,
Accuracy, Statistics
Quality
Wrapping
High-Level API
Deployment &
Monitoring
C++ SDK
Core ML Stack at Wise.io
G. Blanco
D. Eads
J. Richards P. Baines H. Brink
Wise DataSet
BaseVariableGroup BaseVariableGroup BaseVariableGroup
InstanceGroup
InstanceGroup
InstanceGroup
RowSparse
RowMajor
HeterogeneousCache
AlgoRepo
ColSparse
MemMapped
Variable Mapper
Level Mapper
• fast, highly memory-efficient
• heterogeneous
• distributed
Goal: easily surface algorithms 

(written in C++ to be cache
exploitative) to Python



WiseDataSets
Language-agnostic C++ Base Classes
Python-specific Derived Classes
Output
Input
Iterator
Processor
Array
Processor
String
Processor
FrameBuilder SeriesBuilder StringBuilder
Frame
Processor
R-specific Derived Classes
• expose flexible interface from Python, to high-
performance, Python-agnostic C++ code
• pass arbitrary data between layers using 

“Protocol Master” (like Protobufs)
• write C++ code generically for GraphLab, Spark,
pandas, and Wise
WiseTransfer
Datasets for Data Science Comparison
• ]
• -
Slicing
Induces
Copy
Immutable
Columns
Query
Transfer
Speed to
Python
C++
SDK
Distributed
Memory
Efficiency
Categorical
Optimized
Sparse &
Dense
Pandas
DataFrame
Sequences No Yes N/A No No Medium Medium Yes
GraphLab
SFrame Yes Yes Yes Low Yes Yes High No Yes
Spark
DataFrame Yes Yes Yes Very Low No Yes Low No Yes
Dask Yes No Yes N/A No Yes Medium No No
Blaze No No Yes N/A No Yes Medium No No
Wise
DataSet
Copy-
on-write
No Yes Very High Yes Yes
Very
High
High Yes
See also: Rob Story, today
Enforcing (Weak) Contracts: Monitoring Deployments
Build DS
workflow
on test set,
like the offline
testing accuracy
deploy & start
monitoring
results
online, accuracy
is worse
than expected
?
1. Bang head to find (subtle) overfitting in model
2. Retrain: with new data (mo’ data, better answers)
3. Concept Drift: if retraining doesn’t help, jigger the DS workflow
4. Maybe that’s ok: Prediction influenced outcome. Hold out some live.
What to do:
see also, Chris Harland’s talk yesterday; Mike Manapat, today
unit tests
Regression Tests
Integration Tests
Of course you’re
doing this…
ETL Testing
is my contract
affected by the
(changing)
update?
Model Deployment
Testing
@treycausey (yesterday)
some tools:
Engarde
Hypothesis
Feature Forge
Software
Tests
Enforcing (Weak) Contracts: Monitoring Deployments
1. Need to know when things
are too different than before
2. Then alert a real human
3. Use automated tools to try
to isolate cause of change:
data or code.
reproducibility
• every deployment & drift test
given unique hash
• generate data files & script with
hash
• Perform sampling on known-good
deployments
• Monitor RAM, CPU, accuracy
metrics over time
• Probabalistic testing component of
our continuous integration of ML,

10k++ tests
Wise “WindTunnel”
“Weak Contracts”
ie.
Abstractions within
components bleed through
to other components
cf. Sculley …
1. A'smart'programmer'makes'an'
invenQve'use'of'a'trained'object'
recognizer.'
2. The'object'recognizer'receives'data'that'
does'not'resemble'the'tesQng'data'and'
outputs'nonsense.'
3. The'code'of'the'smart'programmer'does'
not'work.'
Example (via Bottou)
Platonic Form
Data, as we act like it is…
Plutonic Form
…as it is.
NLP
{broken: 3, “blue screen”: 2, ...}
computer
vision
{eyes: [{“location”: [21,13],
“bounding”: [...]}]..}
metadata
Sparse
Dense{num_pages: 12, channel:
“email”...}
Nested
3rd party {author_klout: 34.0, ...} Missing/Noisy
timeseries [2014-12-01T12:03:12,
2014-12-01T12:05:12]
Streaming
Real Data != Benchmark Data
SeismologyNeuroscience
Klein et al.
Astronomy
http://mltsp.io
pip install mltsp
ML tsp.
Machine Learning
Time-Series Platform
R. AllenM. SilverF. Peréz JSB
Domain
scientists
AstroSeismoNeuro
Funding
bodies
S. van der Walt
A. Creillin-Quick
Comp/
Stat/Eng
An open-source web platform for distributed time-series analysis
→
•Selection of sophisticated feature extraction algorithms
•Distributed computation
•Sandboxed execution of custom code
Flask
CLI
(under developement)
REST
/learn
/upload UI
Disco
W1 W2 Wn
Disco worker pool
datastore
DB
Demo!
--
MLTSP Continuous Integration
github.com/drone
github.com/mltsp/mltsp
Test

Container

with
MLTSP
Custom
Feature
Extractor
Sandbox
Worker
Pull request triggers
webhook
Workers-
Disco
SSH
Drone calls GitHub
status API
http://bigmacc.info
Results from MLTSP
The Astrophysical Journal Supplement Series, 203:32 (27pp), 2012 December
Published Work before MLTSP
MVP: Reproduce main results of a scientific paper
Probabilistic Classfication of
Variable Stars
Shivvers,JSB,Richards MNRAS,2014
106 “DEB” candidates
12 new
mass-radii
15 “RCB/DYP”

candidates
8 new discoveries
Triple # of
Galactic
DYPer Stars
Miller, Richards, JSB,..ApJ 2012
5400
Spectroscopic
Targets
Miller, JSB, Richards,..ApJ 2015
Turn synoptic
imaged into
~spectrographs
WISE SUPPORT
FROM
SUBJECT
DESCRIPTION
Support Ticket
DATE
FROM
SUBJECT
DESCRIPTION
Support Ticket
DATE
FROM
SUBJECT
DESCRIPTION
Support Ticket
DATE
FROM
SUBJECT
DESCRIPTION
Support Ticket
DATE
FROM
SUBJECT
DESCRIPTION
Support Ticket
DATE
FROM
SUBJECT
DESCRIPTION
Support Ticket
DATE
FROM
SUBJECT
DESCRIPTION
Support Ticket
DATE
TIER 1
AUTOMATED
RESPONSE
CUSTOMER
FROM COMPLEXITY TO CLARITY
INTELLIGENT
ROUTING
RECOMMENDED
RESPONSE
AUTOMATED
REPLY
Wise Support
>30%
faster avg.
response
time
more
consistent
answers
faster
scaling of
support
teams
Fault Tolerant ML
augmentation vs. full automation
Random forest
prediction of body
segment in Xbox
Kinect
gmail
https://www.reddit.com/r/funny/comments/3e7gy4/yes_netflix_because_my_6_year_old_will_enjoy_the/
“Yes Netflix,
because my 6 year
old will enjoy the
animated fun of
Sons of Anarchy”
[So]'What'should'be'the'machine'learning'engineering'process?”'
“Machine'learning'disrupts'so_ware'engineering'
- Leon Bottou (Facebook)
ỉπ vs.
(or “Data Science is a Team Sport”)
deep domain skill/knowledge/training
deep methodological knowledge/skill
deep domain or methodological skill/knowledge/training
strong methodological or domain knowledge/skill
Goal: empower teams of gamma’s to excel
ML Systems: It Takes a Village
‣ Novel testing can strengthen abstractions within components,
and contracts between
‣ Machine Learning Systems require optimizations across
components - so we’d better understand the true loss function
‣ (End user) fault tolerance is a must
Parting Thoughts
‣ Build ML into Systems because to have to…
Area Man
Bites off more
than he can chew
PyData 2014
Thanks!
@pro6sb
A"Systems"View"of"Machine"Learning
in#science#&#industry

More Related Content

What's hot

Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14Sri Ambati
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future TensePaco Nathan
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoSri Ambati
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneDhiana Deva
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Scalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2OScalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2Oodsc
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013MLconf
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
 
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard PafkaH2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard PafkaSri Ambati
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
 
Patterns of Streaming Applications
Patterns of Streaming ApplicationsPatterns of Streaming Applications
Patterns of Streaming ApplicationsC4Media
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Turi, Inc.
 

What's hot (20)

Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future Tense
 
李育杰/The Growth of a Data Scientist
李育杰/The Growth of a Data Scientist李育杰/The Growth of a Data Scientist
李育杰/The Growth of a Data Scientist
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry Larko
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for Everyone
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Scalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2OScalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2O
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard PafkaH2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
AI Development with H2O.ai
AI Development with H2O.aiAI Development with H2O.ai
AI Development with H2O.ai
 
Patterns of Streaming Applications
Patterns of Streaming ApplicationsPatterns of Streaming Applications
Patterns of Streaming Applications
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
 

Similar to PyData 2015 Keynote: "A Systems View of Machine Learning"

Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
No more Three Tier - A path to a better code for Cloud and Azure
No more Three Tier - A path to a better code for Cloud and AzureNo more Three Tier - A path to a better code for Cloud and Azure
No more Three Tier - A path to a better code for Cloud and AzureMarco Parenzan
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Onyxfish
 
muCon 2017 - Build Confidence in your System with Chaos Engineering
muCon 2017 - Build Confidence in your System with Chaos EngineeringmuCon 2017 - Build Confidence in your System with Chaos Engineering
muCon 2017 - Build Confidence in your System with Chaos EngineeringSylvain Hellegouarch
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsGábor Szárnyas
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfAltinity Ltd
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
 
The journy to real time analytics
The journy to real time analyticsThe journy to real time analytics
The journy to real time analyticsNoSQL TLV
 
OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseAtScale
 
Ben ford intro
Ben ford introBen ford intro
Ben ford introPuppet
 
Telemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordTelemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordPuppet
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)guestebde
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 

Similar to PyData 2015 Keynote: "A Systems View of Machine Learning" (20)

Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
No more Three Tier - A path to a better code for Cloud and Azure
No more Three Tier - A path to a better code for Cloud and AzureNo more Three Tier - A path to a better code for Cloud and Azure
No more Three Tier - A path to a better code for Cloud and Azure
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.
 
muCon 2017 - Build Confidence in your System with Chaos Engineering
muCon 2017 - Build Confidence in your System with Chaos EngineeringmuCon 2017 - Build Confidence in your System with Chaos Engineering
muCon 2017 - Build Confidence in your System with Chaos Engineering
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
The journy to real time analytics
The journy to real time analyticsThe journy to real time analytics
The journy to real time analytics
 
OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure Synapse
 
Ben ford intro
Ben ford introBen ford intro
Ben ford intro
 
Telemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordTelemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben Ford
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 

More from Joshua Bloom

Autoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series dataAutoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series dataJoshua Bloom
 
Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Joshua Bloom
 
Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Joshua Bloom
 
Data Science Education: Needs & Opportunities in Astronomy
Data Science Education: Needs & Opportunities in AstronomyData Science Education: Needs & Opportunities in Astronomy
Data Science Education: Needs & Opportunities in AstronomyJoshua Bloom
 
Computational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data LiteracyComputational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data LiteracyJoshua Bloom
 
Large-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain AstrophysicsLarge-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain AstrophysicsJoshua Bloom
 
Data Science at Berkeley
Data Science at BerkeleyData Science at Berkeley
Data Science at BerkeleyJoshua Bloom
 
Computational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain ScientistsComputational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain ScientistsJoshua Bloom
 
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey EraJoshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey EraJoshua Bloom
 

More from Joshua Bloom (9)

Autoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series dataAutoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series data
 
Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)
 
Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)
 
Data Science Education: Needs & Opportunities in Astronomy
Data Science Education: Needs & Opportunities in AstronomyData Science Education: Needs & Opportunities in Astronomy
Data Science Education: Needs & Opportunities in Astronomy
 
Computational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data LiteracyComputational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data Literacy
 
Large-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain AstrophysicsLarge-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain Astrophysics
 
Data Science at Berkeley
Data Science at BerkeleyData Science at Berkeley
Data Science at Berkeley
 
Computational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain ScientistsComputational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain Scientists
 
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey EraJoshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 

PyData 2015 Keynote: "A Systems View of Machine Learning"