SlideShare a Scribd company logo
1 of 36
Download to read offline
H2O.ai
Machine Intelligence
Fast, Scalable In-Memory Machine and Deep Learning
For Smarter Applications
Python & Sparkling Water
with H2O
Cliff Click
Michal Malohlava
H2O.ai
Machine Intelligence
Who Am I?
Cliff Click
CTO, Co-Founder H2O.ai
cliff@h2o.ai
40 yrs coding
35 yrs building compilers
30 yrs distributed computation
20 yrs OS, device drivers, HPC, HotSpot
10 yrs Low-latency GC, custom java hardware
NonBlockingHashMap
20 patents, dozens of papers
100s of public talks
PhD Computer Science
1995 Rice University
HotSpot JVM Server Compiler
“showed the world JITing is possible”
H2O.ai
Machine Intelligence
H2O Open Source In-Memory
Machine Learning for Big Data
Distributed In-Memory Math Platform
GLM, GBM, RF, K-Means, PCA, Deep Learning
Easy to use SDK & API
Java, R (CRAN), Scala, Spark, Python, JSON, Browser GUI
Use ALL your data
Modeling without sampling
HDFS, S3, NFS, NoSql
Big Data & Better Algorithms
Better Predictions!
H2O.ai
Machine Intelligence
TBD.
Customer
Support
TBD
Head of
Sales
Distributed
Systems
Engineers
Making
ML Scale!
H2O.ai
Machine Intelligence
Practical Machine Learning
Value Requirements
Fast & Interactive In-Memory
Big Data (No Sampling) Distributed
Ownership Open Source
Extensibility API/SDK
Portability Java, REST/JSON
Infrastructure
Cloud or On-Premise Hadoop
or Private Cluster
H2O.ai
Machine Intelligence
H2O Architecture
Prediction Engine
R & Exec Engine
Web Interface
Spark Scala REPL
Nano-Fast
Scoring Engine
Distributed
In-Memory K/V Store
Column Compress Data
Map/Reduce
Memory Manager
Algorithms!
GBM, Random Forest,
GLM, PCA, K-Means,
Deep Learning
HDFS S3 NFS
RealTime
DataFlow
H2O.ai
Machine Intelligence
H2O Architecture
Prediction Engine
R & Exec Engine
Web Interface
Spark Scala REPL
Nano-Fast
Scoring Engine
Distributed
In-Memory K/V Store
Column Compress Data
Map/Reduce
Memory Manager
Algorithms!
GBM, Random Forest,
GLM, PCA, K-Means,
Deep Learning
HDFS S3 NFS
RealTime
DataFlow
H2O.ai
Machine Intelligence
Python & Sparkling Water
●  CitiBike of NYC
●  Predict bikes-per-hour-per-station
–  From per-trip logs
●  10M rows of data
●  Group-By, date/time feature-munging
Demo!
H2O.ai
Machine Intelligence
H2O: A Platform for Big Math
●  Most Any Java on Big 2-D Tables
–  Write like its single-thread POJO code
–  Runs distributed & parallel by default
●  Fast: billion row logistic regression takes 4 sec
●  Worlds first parallel & distributed GBM
–  Plus Deep Learn / Neural Nets, RF, PCA, K-means...
●  R integration: use terabyte datasets from R
●  Sparkling Water: Direct Spark integration
H2O.ai
Machine Intelligence
H2O: A Platform for Big Math
●  Easy launch: “java -jar h2o.jar”
–  No GC tuning: -Xmx as big as you like
●  Production ready:
–  Private on-premise cluster OR
In the Cloud
–  Hadoop, Yarn, EC2, or standalone cluster
–  HDFS, S3, NFS, URI & other datasources
–  Open Source, Apache v2
Can I call H2O’s
algorithms from
my Spark
workflow?
YES,
You can!
Sparkling
Water
Sparkling Water
Provides
Transparent integration into Spark ecosystem
Pure H2ORDD encapsulating H2O DataFrame
Transparent use of H2O data structures and
algorithms with Spark API
Excels in Spark workflows requiring
advanced Machine Learning algorithms
Sparkling Water Design
spark-submit
Spark
Master
JVM
Spark
Worker
JVM
Spark
Worker
JVM
Spark
Worker
JVM
Sparkling Water Cluster
Spark
Executor
JVM
H2O
Spark
Executor
JVM
H2O
Spark
Executor
JVM
H2O
Sparkling
App
implements
?
Data Distribution
H2O
H2O
H2O
Sparkling Water Cluster
Spark Executor JVM
Data
Source
(e.g.
HDFS)
H2O
RDD
Spark Executor JVM
Spark Executor JVM
Spark
RDD
RDDs and DataFrames
share same memory
space
Demo time!
SPARKLING WATER DEMO
H2O.AI
Created by /H2O.ai @h2oai
LAUNCH SPARKLING SHELL
> export SPARK_HOME="/path/to/spark/installation"
> bin/sparkling-shell
PREPARE AN ENVIRONMENT
val DIR_PREFIX = "/Users/michal/Devel/projects/h2o/repos/h2o2/bigdata/laptop/
// Common imports
import org.apache.spark.h2o._
import org.apache.spark.examples.h2o._
import org.apache.spark.examples.h2o.DemoUtils._
import org.apache.spark.sql.SQLContext
import water.fvec._
import hex.tree.gbm.GBM
import hex.tree.gbm.GBMModel.GBMParameters
// Initialize Spark SQLContext
implicit val sqlContext = new SQLContext(sc)
import sqlContext._
LAUNCH H2O SERVICES
implicit val h2oContext = new H2OContext(sc).start()
import h2oContext._
LOAD CITIBIKE DATA
USING H2O API
val dataFiles = Array[String](
"2013-07.csv", "2013-08.csv", "2013-09.csv", "2013-10.csv",
"2013-11.csv", "2013-12.csv").map(f => new java.io.File(DIR_PREFIX, f))
// Load and parse data
val bikesDF = new DataFrame(dataFiles:_*)
// Rename columns and remove all spaces in header
val colNames = bikesDF.names().map( n => n.replace(' ', '_'))
bikesDF._names = colNames
bikesDF.update(null)
USER-DEFINED COLUMN TRANSFORMATION
// Select column 'startime'
val startTimeF = bikesDF('starttime)
// Invoke column transformation and append the created column
bikesDF.add(new TimeSplit().doIt(startTimeF))
// Do not forget to update frame in K/V store
bikesDF.update(null)
OPEN H2O FLOW UI
openFlow
AND EXPLORE DATA...
> getFrames
...
FROM H2O'S DATAFRAME TO RDD
val bikesRdd = asSchemaRDD(bikesDF)
USE SPARK SQL
// Register table and SQL table
sqlContext.registerRDDAsTable(bikesRdd, "bikesRdd")
// Perform SQL group operation
val bikesPerDayRdd = sql(
"""SELECT Days, start_station_id, count(*) bikes
|FROM bikesRdd
|GROUP BY Days, start_station_id """.stripMargin)
FROM RDD TO H2O'S DATAFRAME
val bikesPerDayDF:DataFrame = bikesPerDayRdd
AND PERFORM ADDITIONAL COLUMN TRANSFORMATION
// Select "Days" column
val daysVec = bikesPerDayDF('Days)
// Refine column into "Month" and "DayOfWeek"
val finalBikeDF = bikesPerDayDF.add(new TimeTransform().doIt(daysVec))
TIME TO BUILD A MODEL!
GBM MODEL BUILDER
def buildModel(df: DataFrame, trees: Int = 200, depth: Int = 6):R2 = {
// Split into train and test parts
val frs = splitFrame(df, Seq("train.hex", "test.hex", "hold.hex"), Seq(0.6, 0.3, 0.1))
val (train, test, hold) = (frs(0), frs(1), frs(2))
// Configure GBM parameters
val gbmParams = new GBMParameters()
gbmParams._train = train
gbmParams._valid = test
gbmParams._response_column = 'bikes
gbmParams._ntrees = trees
gbmParams._max_depth = depth
// Build a model
val gbmModel = new GBM(gbmParams).trainModel.get
// Score datasets
Seq(train,test,hold).foreach(gbmModel.score(_).delete)
// Collect R2 metrics
val result = R2("Model #1", r2(gbmModel, train), r2(gbmModel, test), r2(gbmModel, hold))
// Perform clean-up
Seq(train, test, hold).foreach(_.delete())
result
}
BUILD A GBM MODEL
val result1 = buildModel(finalBikeDF)
CAN WE IMPROVE MODEL
BY USING INFORMATION
ABOUT WEATHER?
LOAD WEATHER DATA
USING SPARK API
// Load weather data in NY 2013
val weatherData = sc.textFile(DIR_PREFIX + "31081_New_York_City__Hourly_2013.csv")
// Parse data and filter them
val weatherRdd = weatherData.map(_.split(",")).
map(row => NYWeatherParse(row)).
filter(!_.isWrongRow()).
filter(_.HourLocal == Some(12)).setName("weather").cache()
CREATE A JOINED TABLE
USING H2O'S DATAFRAME AND SPARK'S RDD
// Join with bike table
sqlContext.registerRDDAsTable(weatherRdd, "weatherRdd")
sqlContext.registerRDDAsTable(asSchemaRDD(finalBikeDF), "bikesRdd")
val bikesWeatherRdd = sql(
"""SELECT b.Days, b.start_station_id, b.bikes,
|b.Month, b.DayOfWeek,
|w.DewPoint, w.HumidityFraction, w.Prcp1Hour,
|w.Temperature, w.WeatherCode1
| FROM bikesRdd b
| JOIN weatherRdd w
| ON b.Days = w.Days
""".stripMargin)
BUILD A NEW MODEL
USING SPARK'S RDD IN H2O'S API
val result2 = buildModel(bikesWeatherRdd)
Checkout H2O.ai Training Books
http://learn.h2o.ai/

Checkout H2O.ai Blog
http://h2o.ai/blog/

Checkout H2O.ai Youtube Channel
https://www.youtube.com/user/0xdata

Checkout GitHub
https://github.com/h2oai
More info
Learn more about H2O at h2o.ai
Thank you!
Follow us at
@h2oai

More Related Content

What's hot

Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng ShiDatabricks
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & ZeppelinVinay Shukla
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Hubert Fan Chiang
 
Sparkling Water Workshop
Sparkling Water WorkshopSparkling Water Workshop
Sparkling Water WorkshopSri Ambati
 
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and HueHadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Huegethue
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopJosh Patterson
 
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark OperatorDeploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark OperatorDatabricks
 
SparkR + Zeppelin
SparkR + ZeppelinSparkR + Zeppelin
SparkR + Zeppelinfelixcss
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Databricks
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuDatabricks
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionSri Ambati
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzDatabricks
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesDatabricks
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaSpark Summit
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache SparkMammoth Data
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionDataWorks Summit
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development KitJen Aman
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 

What's hot (20)

Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
Open source data ingestion
Open source data ingestionOpen source data ingestion
Open source data ingestion
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Sparkling Water Workshop
Sparkling Water WorkshopSparkling Water Workshop
Sparkling Water Workshop
 
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and HueHadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
 
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark OperatorDeploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
 
SparkR + Zeppelin
SparkR + ZeppelinSparkR + Zeppelin
SparkR + Zeppelin
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 

Viewers also liked

Driving the Future of Smart Cities - How to Beat the Traffic
Driving the Future of Smart Cities - How to Beat the TrafficDriving the Future of Smart Cities - How to Beat the Traffic
Driving the Future of Smart Cities - How to Beat the TrafficVMware Tanzu
 
Open Standards in the Walled Garden
Open Standards in the Walled GardenOpen Standards in the Walled Garden
Open Standards in the Walled Gardendigitalbindery
 
Strata San Jose 2016: Deep Learning is eating your lunch -- and mine
Strata San Jose 2016: Deep Learning is eating your lunch -- and mineStrata San Jose 2016: Deep Learning is eating your lunch -- and mine
Strata San Jose 2016: Deep Learning is eating your lunch -- and mineSri Ambati
 
Social Gold: The Design of FarmVille and Other Social Games (Web2Expo 2010)
Social Gold: The Design of FarmVille and Other Social Games (Web2Expo 2010)Social Gold: The Design of FarmVille and Other Social Games (Web2Expo 2010)
Social Gold: The Design of FarmVille and Other Social Games (Web2Expo 2010)Amitt Mahajan
 
Kobo: What Do eBook Customers Really, Really Want? (Tools of Change 2011)
Kobo: What Do eBook Customers Really, Really Want? (Tools of Change 2011)Kobo: What Do eBook Customers Really, Really Want? (Tools of Change 2011)
Kobo: What Do eBook Customers Really, Really Want? (Tools of Change 2011)MTamblyn
 
Mobilising the world's Natural History - Open Data + Citizen Science
Mobilising the world's Natural History - Open Data + Citizen ScienceMobilising the world's Natural History - Open Data + Citizen Science
Mobilising the world's Natural History - Open Data + Citizen ScienceMargaret Gold
 
(Short version) Building a Mobile, Social, Location-Based Game in 5 Weeks
(Short version) Building a Mobile, Social, Location-Based Game in 5 Weeks(Short version) Building a Mobile, Social, Location-Based Game in 5 Weeks
(Short version) Building a Mobile, Social, Location-Based Game in 5 WeeksJennie Lees
 
Smaller, Flatter, Smarter
Smaller, Flatter, SmarterSmaller, Flatter, Smarter
Smaller, Flatter, SmarterWeb 2.0 Expo
 
Web 2.0 Expo Speech: Open Leadership
Web 2.0 Expo Speech: Open LeadershipWeb 2.0 Expo Speech: Open Leadership
Web 2.0 Expo Speech: Open LeadershipCharlene Li
 
Data Science and Smart Systems: Creating the Digital Brain
Data Science and Smart Systems: Creating the Digital Brain Data Science and Smart Systems: Creating the Digital Brain
Data Science and Smart Systems: Creating the Digital Brain VMware Tanzu
 
Hadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr AwadallahHadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr AwadallahCloudera, Inc.
 
Locked Out in London (and tweeting about it) - version with my notes
Locked Out in London (and tweeting about it) - version with my notesLocked Out in London (and tweeting about it) - version with my notes
Locked Out in London (and tweeting about it) - version with my notesSylvain Carle
 
Did Social Media Hijack My Communications Strategy
Did Social Media Hijack My Communications StrategyDid Social Media Hijack My Communications Strategy
Did Social Media Hijack My Communications StrategyMike Smith
 
Kobo: What Do eBook Customers Really, Really Want? (Michael Tamblyn at Tools ...
Kobo: What Do eBook Customers Really, Really Want? (Michael Tamblyn at Tools ...Kobo: What Do eBook Customers Really, Really Want? (Michael Tamblyn at Tools ...
Kobo: What Do eBook Customers Really, Really Want? (Michael Tamblyn at Tools ...Kobo
 
The Laws of User Experience: Making it or Breaking It with the UX Factor
The Laws of User Experience: Making it or Breaking It with the UX FactorThe Laws of User Experience: Making it or Breaking It with the UX Factor
The Laws of User Experience: Making it or Breaking It with the UX FactorEffective
 
Securing Application Deployments in CI/CD Environments (Updated slides: http:...
Securing Application Deployments in CI/CD Environments (Updated slides: http:...Securing Application Deployments in CI/CD Environments (Updated slides: http:...
Securing Application Deployments in CI/CD Environments (Updated slides: http:...Binu Ramakrishnan
 
Forking Successfully - or is a branch better?
Forking Successfully - or is a branch better?Forking Successfully - or is a branch better?
Forking Successfully - or is a branch better?Colin Charles
 
Advanced Caching Concepts @ Velocity NY 2015
Advanced Caching Concepts @ Velocity NY 2015Advanced Caching Concepts @ Velocity NY 2015
Advanced Caching Concepts @ Velocity NY 2015Rakesh Chaudhary
 

Viewers also liked (20)

Driving the Future of Smart Cities - How to Beat the Traffic
Driving the Future of Smart Cities - How to Beat the TrafficDriving the Future of Smart Cities - How to Beat the Traffic
Driving the Future of Smart Cities - How to Beat the Traffic
 
Open Standards in the Walled Garden
Open Standards in the Walled GardenOpen Standards in the Walled Garden
Open Standards in the Walled Garden
 
Kevin Kelly
Kevin KellyKevin Kelly
Kevin Kelly
 
Demand Media
Demand MediaDemand Media
Demand Media
 
Strata San Jose 2016: Deep Learning is eating your lunch -- and mine
Strata San Jose 2016: Deep Learning is eating your lunch -- and mineStrata San Jose 2016: Deep Learning is eating your lunch -- and mine
Strata San Jose 2016: Deep Learning is eating your lunch -- and mine
 
Social Gold: The Design of FarmVille and Other Social Games (Web2Expo 2010)
Social Gold: The Design of FarmVille and Other Social Games (Web2Expo 2010)Social Gold: The Design of FarmVille and Other Social Games (Web2Expo 2010)
Social Gold: The Design of FarmVille and Other Social Games (Web2Expo 2010)
 
Kobo: What Do eBook Customers Really, Really Want? (Tools of Change 2011)
Kobo: What Do eBook Customers Really, Really Want? (Tools of Change 2011)Kobo: What Do eBook Customers Really, Really Want? (Tools of Change 2011)
Kobo: What Do eBook Customers Really, Really Want? (Tools of Change 2011)
 
Mobilising the world's Natural History - Open Data + Citizen Science
Mobilising the world's Natural History - Open Data + Citizen ScienceMobilising the world's Natural History - Open Data + Citizen Science
Mobilising the world's Natural History - Open Data + Citizen Science
 
(Short version) Building a Mobile, Social, Location-Based Game in 5 Weeks
(Short version) Building a Mobile, Social, Location-Based Game in 5 Weeks(Short version) Building a Mobile, Social, Location-Based Game in 5 Weeks
(Short version) Building a Mobile, Social, Location-Based Game in 5 Weeks
 
Smaller, Flatter, Smarter
Smaller, Flatter, SmarterSmaller, Flatter, Smarter
Smaller, Flatter, Smarter
 
Web 2.0 Expo Speech: Open Leadership
Web 2.0 Expo Speech: Open LeadershipWeb 2.0 Expo Speech: Open Leadership
Web 2.0 Expo Speech: Open Leadership
 
Data Science and Smart Systems: Creating the Digital Brain
Data Science and Smart Systems: Creating the Digital Brain Data Science and Smart Systems: Creating the Digital Brain
Data Science and Smart Systems: Creating the Digital Brain
 
Hadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr AwadallahHadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr Awadallah
 
Locked Out in London (and tweeting about it) - version with my notes
Locked Out in London (and tweeting about it) - version with my notesLocked Out in London (and tweeting about it) - version with my notes
Locked Out in London (and tweeting about it) - version with my notes
 
Did Social Media Hijack My Communications Strategy
Did Social Media Hijack My Communications StrategyDid Social Media Hijack My Communications Strategy
Did Social Media Hijack My Communications Strategy
 
Kobo: What Do eBook Customers Really, Really Want? (Michael Tamblyn at Tools ...
Kobo: What Do eBook Customers Really, Really Want? (Michael Tamblyn at Tools ...Kobo: What Do eBook Customers Really, Really Want? (Michael Tamblyn at Tools ...
Kobo: What Do eBook Customers Really, Really Want? (Michael Tamblyn at Tools ...
 
The Laws of User Experience: Making it or Breaking It with the UX Factor
The Laws of User Experience: Making it or Breaking It with the UX FactorThe Laws of User Experience: Making it or Breaking It with the UX Factor
The Laws of User Experience: Making it or Breaking It with the UX Factor
 
Securing Application Deployments in CI/CD Environments (Updated slides: http:...
Securing Application Deployments in CI/CD Environments (Updated slides: http:...Securing Application Deployments in CI/CD Environments (Updated slides: http:...
Securing Application Deployments in CI/CD Environments (Updated slides: http:...
 
Forking Successfully - or is a branch better?
Forking Successfully - or is a branch better?Forking Successfully - or is a branch better?
Forking Successfully - or is a branch better?
 
Advanced Caching Concepts @ Velocity NY 2015
Advanced Caching Concepts @ Velocity NY 2015Advanced Caching Concepts @ Velocity NY 2015
Advanced Caching Concepts @ Velocity NY 2015
 

Similar to Machine Learning with H2O, Spark, and Python at Strata 2015

PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowChetan Khatri
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Аліна Шепшелей
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...Inhacking
 
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Databricks
 
Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015Sri Ambati
 
Austin Data Meetup 092014 - Spark
Austin Data Meetup 092014 - SparkAustin Data Meetup 092014 - Spark
Austin Data Meetup 092014 - SparkSteve Blackmon
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks
 
Sparkling Water
Sparkling WaterSparkling Water
Sparkling Waterh2oworld
 
Data Microservices In The Cloud + 日本語コメント
Data Microservices In The Cloud + 日本語コメントData Microservices In The Cloud + 日本語コメント
Data Microservices In The Cloud + 日本語コメントTakuya Saeki
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionChetan Khatri
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksAnyscale
 
Sparkling Water Webinar October 29th, 2014
Sparkling Water Webinar October 29th, 2014Sparkling Water Webinar October 29th, 2014
Sparkling Water Webinar October 29th, 2014Sri Ambati
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Sparkjlacefie
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetupjlacefie
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study NotesRichard Kuo
 

Similar to Machine Learning with H2O, Spark, and Python at Strata 2015 (20)

PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
 
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
 
Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Austin Data Meetup 092014 - Spark
Austin Data Meetup 092014 - SparkAustin Data Meetup 092014 - Spark
Austin Data Meetup 092014 - Spark
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Sparkling Water
Sparkling WaterSparkling Water
Sparkling Water
 
Data Microservices In The Cloud + 日本語コメント
Data Microservices In The Cloud + 日本語コメントData Microservices In The Cloud + 日本語コメント
Data Microservices In The Cloud + 日本語コメント
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Sparkling Water Webinar October 29th, 2014
Sparkling Water Webinar October 29th, 2014Sparkling Water Webinar October 29th, 2014
Sparkling Water Webinar October 29th, 2014
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Spark
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetup
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 

Recently uploaded (20)

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 

Machine Learning with H2O, Spark, and Python at Strata 2015

  • 1. H2O.ai Machine Intelligence Fast, Scalable In-Memory Machine and Deep Learning For Smarter Applications Python & Sparkling Water with H2O Cliff Click Michal Malohlava
  • 2. H2O.ai Machine Intelligence Who Am I? Cliff Click CTO, Co-Founder H2O.ai cliff@h2o.ai 40 yrs coding 35 yrs building compilers 30 yrs distributed computation 20 yrs OS, device drivers, HPC, HotSpot 10 yrs Low-latency GC, custom java hardware NonBlockingHashMap 20 patents, dozens of papers 100s of public talks PhD Computer Science 1995 Rice University HotSpot JVM Server Compiler “showed the world JITing is possible”
  • 3. H2O.ai Machine Intelligence H2O Open Source In-Memory Machine Learning for Big Data Distributed In-Memory Math Platform GLM, GBM, RF, K-Means, PCA, Deep Learning Easy to use SDK & API Java, R (CRAN), Scala, Spark, Python, JSON, Browser GUI Use ALL your data Modeling without sampling HDFS, S3, NFS, NoSql Big Data & Better Algorithms Better Predictions!
  • 5. H2O.ai Machine Intelligence Practical Machine Learning Value Requirements Fast & Interactive In-Memory Big Data (No Sampling) Distributed Ownership Open Source Extensibility API/SDK Portability Java, REST/JSON Infrastructure Cloud or On-Premise Hadoop or Private Cluster
  • 6. H2O.ai Machine Intelligence H2O Architecture Prediction Engine R & Exec Engine Web Interface Spark Scala REPL Nano-Fast Scoring Engine Distributed In-Memory K/V Store Column Compress Data Map/Reduce Memory Manager Algorithms! GBM, Random Forest, GLM, PCA, K-Means, Deep Learning HDFS S3 NFS RealTime DataFlow
  • 7. H2O.ai Machine Intelligence H2O Architecture Prediction Engine R & Exec Engine Web Interface Spark Scala REPL Nano-Fast Scoring Engine Distributed In-Memory K/V Store Column Compress Data Map/Reduce Memory Manager Algorithms! GBM, Random Forest, GLM, PCA, K-Means, Deep Learning HDFS S3 NFS RealTime DataFlow
  • 8. H2O.ai Machine Intelligence Python & Sparkling Water ●  CitiBike of NYC ●  Predict bikes-per-hour-per-station –  From per-trip logs ●  10M rows of data ●  Group-By, date/time feature-munging Demo!
  • 9. H2O.ai Machine Intelligence H2O: A Platform for Big Math ●  Most Any Java on Big 2-D Tables –  Write like its single-thread POJO code –  Runs distributed & parallel by default ●  Fast: billion row logistic regression takes 4 sec ●  Worlds first parallel & distributed GBM –  Plus Deep Learn / Neural Nets, RF, PCA, K-means... ●  R integration: use terabyte datasets from R ●  Sparkling Water: Direct Spark integration
  • 10. H2O.ai Machine Intelligence H2O: A Platform for Big Math ●  Easy launch: “java -jar h2o.jar” –  No GC tuning: -Xmx as big as you like ●  Production ready: –  Private on-premise cluster OR In the Cloud –  Hadoop, Yarn, EC2, or standalone cluster –  HDFS, S3, NFS, URI & other datasources –  Open Source, Apache v2
  • 11. Can I call H2O’s algorithms from my Spark workflow?
  • 14. Sparkling Water Provides Transparent integration into Spark ecosystem Pure H2ORDD encapsulating H2O DataFrame Transparent use of H2O data structures and algorithms with Spark API Excels in Spark workflows requiring advanced Machine Learning algorithms
  • 15. Sparkling Water Design spark-submit Spark Master JVM Spark Worker JVM Spark Worker JVM Spark Worker JVM Sparkling Water Cluster Spark Executor JVM H2O Spark Executor JVM H2O Spark Executor JVM H2O Sparkling App implements ?
  • 16. Data Distribution H2O H2O H2O Sparkling Water Cluster Spark Executor JVM Data Source (e.g. HDFS) H2O RDD Spark Executor JVM Spark Executor JVM Spark RDD RDDs and DataFrames share same memory space
  • 19. LAUNCH SPARKLING SHELL > export SPARK_HOME="/path/to/spark/installation" > bin/sparkling-shell
  • 20. PREPARE AN ENVIRONMENT val DIR_PREFIX = "/Users/michal/Devel/projects/h2o/repos/h2o2/bigdata/laptop/ // Common imports import org.apache.spark.h2o._ import org.apache.spark.examples.h2o._ import org.apache.spark.examples.h2o.DemoUtils._ import org.apache.spark.sql.SQLContext import water.fvec._ import hex.tree.gbm.GBM import hex.tree.gbm.GBMModel.GBMParameters // Initialize Spark SQLContext implicit val sqlContext = new SQLContext(sc) import sqlContext._
  • 21. LAUNCH H2O SERVICES implicit val h2oContext = new H2OContext(sc).start() import h2oContext._
  • 22. LOAD CITIBIKE DATA USING H2O API val dataFiles = Array[String]( "2013-07.csv", "2013-08.csv", "2013-09.csv", "2013-10.csv", "2013-11.csv", "2013-12.csv").map(f => new java.io.File(DIR_PREFIX, f)) // Load and parse data val bikesDF = new DataFrame(dataFiles:_*) // Rename columns and remove all spaces in header val colNames = bikesDF.names().map( n => n.replace(' ', '_')) bikesDF._names = colNames bikesDF.update(null)
  • 23. USER-DEFINED COLUMN TRANSFORMATION // Select column 'startime' val startTimeF = bikesDF('starttime) // Invoke column transformation and append the created column bikesDF.add(new TimeSplit().doIt(startTimeF)) // Do not forget to update frame in K/V store bikesDF.update(null)
  • 24. OPEN H2O FLOW UI openFlow AND EXPLORE DATA... > getFrames ...
  • 25. FROM H2O'S DATAFRAME TO RDD val bikesRdd = asSchemaRDD(bikesDF)
  • 26. USE SPARK SQL // Register table and SQL table sqlContext.registerRDDAsTable(bikesRdd, "bikesRdd") // Perform SQL group operation val bikesPerDayRdd = sql( """SELECT Days, start_station_id, count(*) bikes |FROM bikesRdd |GROUP BY Days, start_station_id """.stripMargin)
  • 27. FROM RDD TO H2O'S DATAFRAME val bikesPerDayDF:DataFrame = bikesPerDayRdd AND PERFORM ADDITIONAL COLUMN TRANSFORMATION // Select "Days" column val daysVec = bikesPerDayDF('Days) // Refine column into "Month" and "DayOfWeek" val finalBikeDF = bikesPerDayDF.add(new TimeTransform().doIt(daysVec))
  • 28. TIME TO BUILD A MODEL!
  • 29. GBM MODEL BUILDER def buildModel(df: DataFrame, trees: Int = 200, depth: Int = 6):R2 = { // Split into train and test parts val frs = splitFrame(df, Seq("train.hex", "test.hex", "hold.hex"), Seq(0.6, 0.3, 0.1)) val (train, test, hold) = (frs(0), frs(1), frs(2)) // Configure GBM parameters val gbmParams = new GBMParameters() gbmParams._train = train gbmParams._valid = test gbmParams._response_column = 'bikes gbmParams._ntrees = trees gbmParams._max_depth = depth // Build a model val gbmModel = new GBM(gbmParams).trainModel.get // Score datasets Seq(train,test,hold).foreach(gbmModel.score(_).delete) // Collect R2 metrics val result = R2("Model #1", r2(gbmModel, train), r2(gbmModel, test), r2(gbmModel, hold)) // Perform clean-up Seq(train, test, hold).foreach(_.delete()) result }
  • 30. BUILD A GBM MODEL val result1 = buildModel(finalBikeDF)
  • 31. CAN WE IMPROVE MODEL BY USING INFORMATION ABOUT WEATHER?
  • 32. LOAD WEATHER DATA USING SPARK API // Load weather data in NY 2013 val weatherData = sc.textFile(DIR_PREFIX + "31081_New_York_City__Hourly_2013.csv") // Parse data and filter them val weatherRdd = weatherData.map(_.split(",")). map(row => NYWeatherParse(row)). filter(!_.isWrongRow()). filter(_.HourLocal == Some(12)).setName("weather").cache()
  • 33. CREATE A JOINED TABLE USING H2O'S DATAFRAME AND SPARK'S RDD // Join with bike table sqlContext.registerRDDAsTable(weatherRdd, "weatherRdd") sqlContext.registerRDDAsTable(asSchemaRDD(finalBikeDF), "bikesRdd") val bikesWeatherRdd = sql( """SELECT b.Days, b.start_station_id, b.bikes, |b.Month, b.DayOfWeek, |w.DewPoint, w.HumidityFraction, w.Prcp1Hour, |w.Temperature, w.WeatherCode1 | FROM bikesRdd b | JOIN weatherRdd w | ON b.Days = w.Days """.stripMargin)
  • 34. BUILD A NEW MODEL USING SPARK'S RDD IN H2O'S API val result2 = buildModel(bikesWeatherRdd)
  • 35. Checkout H2O.ai Training Books http://learn.h2o.ai/
 Checkout H2O.ai Blog http://h2o.ai/blog/
 Checkout H2O.ai Youtube Channel https://www.youtube.com/user/0xdata
 Checkout GitHub https://github.com/h2oai More info
  • 36. Learn more about H2O at h2o.ai Thank you! Follow us at @h2oai