Submit Search
Upload
20160908 hivemall meetup
•
4 likes
•
1,907 views
Takeshi Yamamuro
Follow
A slide for Hivemall Meetup#3
Read less
Read more
Engineering
Report
Share
Report
Share
1 of 15
Download now
Download to read offline
Recommended
Hivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San Jose
Makoto Yui
3rd Hivemall meetup
3rd Hivemall meetup
Makoto Yui
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
Makoto Yui
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Josef A. Habdank
Introduction to Hivemall
Introduction to Hivemall
Makoto Yui
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
Jim Dowling
Apache Spark & MLlib
Apache Spark & MLlib
Grigory Sapunov
Koalas: Pandas on Apache Spark
Koalas: Pandas on Apache Spark
Databricks
Recommended
Hivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San Jose
Makoto Yui
3rd Hivemall meetup
3rd Hivemall meetup
Makoto Yui
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
Makoto Yui
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Josef A. Habdank
Introduction to Hivemall
Introduction to Hivemall
Makoto Yui
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
Jim Dowling
Apache Spark & MLlib
Apache Spark & MLlib
Grigory Sapunov
Koalas: Pandas on Apache Spark
Koalas: Pandas on Apache Spark
Databricks
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Spark Summit
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
DataWorks Summit/Hadoop Summit
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Databricks
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
Jim Dowling
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Jan Wiegelmann
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
DataWorks Summit
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
Takuya UESHIN
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Yahoo Developer Network
Demystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
Kazuaki Ishizaki
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
Spark Summit
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
Takuya UESHIN
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Summit
Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)
Takuya UESHIN
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Databricks
Hadoop + GPU
Hadoop + GPU
Vladimir Starostenkov
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Databricks
Getting The Best Performance With PySpark
Getting The Best Performance With PySpark
Spark Summit
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Kim Hammar
Sano hmm 20150512
Sano hmm 20150512
Masakazu Sano
hivemallを使って4日間で性別推定した話
hivemallを使って4日間で性別推定した話
eventdotsjp
More Related Content
What's hot
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Spark Summit
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
DataWorks Summit/Hadoop Summit
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Databricks
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
Jim Dowling
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Jan Wiegelmann
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
DataWorks Summit
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
Takuya UESHIN
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Yahoo Developer Network
Demystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
Kazuaki Ishizaki
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
Spark Summit
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
Takuya UESHIN
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Summit
Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)
Takuya UESHIN
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Databricks
Hadoop + GPU
Hadoop + GPU
Vladimir Starostenkov
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Databricks
Getting The Best Performance With PySpark
Getting The Best Performance With PySpark
Spark Summit
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Kim Hammar
What's hot
(20)
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Demystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Hadoop + GPU
Hadoop + GPU
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Getting The Best Performance With PySpark
Getting The Best Performance With PySpark
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Viewers also liked
Sano hmm 20150512
Sano hmm 20150512
Masakazu Sano
hivemallを使って4日間で性別推定した話
hivemallを使って4日間で性別推定した話
eventdotsjp
Hivemallmtup 20160908
Hivemallmtup 20160908
Kazuki Ohmori
Hivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetup
Makoto Yui
2nd Hivemall meetup 20151020
2nd Hivemall meetup 20151020
Makoto Yui
Hivemall meetup vol2 oisix
Hivemall meetup vol2 oisix
Taisuke Fukawa
Hivemallで始める不動産価格推定サービス
Hivemallで始める不動産価格推定サービス
Kentaro Yoshida
Sano tokyowebmining 201625_v04
Sano tokyowebmining 201625_v04
Masakazu Sano
Viewers also liked
(8)
Sano hmm 20150512
Sano hmm 20150512
hivemallを使って4日間で性別推定した話
hivemallを使って4日間で性別推定した話
Hivemallmtup 20160908
Hivemallmtup 20160908
Hivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetup
2nd Hivemall meetup 20151020
2nd Hivemall meetup 20151020
Hivemall meetup vol2 oisix
Hivemall meetup vol2 oisix
Hivemallで始める不動産価格推定サービス
Hivemallで始める不動産価格推定サービス
Sano tokyowebmining 201625_v04
Sano tokyowebmining 201625_v04
Similar to 20160908 hivemall meetup
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
J On The Beach
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Databricks
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Jason Dai
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
Paris Data Engineers !
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
Tim Ellison
NYC_2016_slides
NYC_2016_slides
Nathan Halko
Cisco Connect Toronto 2017 - Model-driven Telemetry
Cisco Connect Toronto 2017 - Model-driven Telemetry
Cisco Canada
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them All
Thomas Wuerthinger
20180417 hivemall meetup#4
20180417 hivemall meetup#4
Takeshi Yamamuro
Apache Cassandra and Apche Spark
Apache Cassandra and Apche Spark
Alex Thompson
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data Analytics
Cisco Canada
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Databricks
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Jen Aman
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
Tim Ellison
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
Yoni Davidson
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
AMD It's Time to ROC
AMD It's Time to ROC
inside-BigData.com
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
felixcss
Similar to 20160908 hivemall meetup
(20)
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
NYC_2016_slides
NYC_2016_slides
Cisco Connect Toronto 2017 - Model-driven Telemetry
Cisco Connect Toronto 2017 - Model-driven Telemetry
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them All
20180417 hivemall meetup#4
20180417 hivemall meetup#4
Apache Cassandra and Apche Spark
Apache Cassandra and Apche Spark
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data Analytics
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
AMD It's Time to ROC
AMD It's Time to ROC
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
More from Takeshi Yamamuro
LT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature Expectation
Takeshi Yamamuro
Apache Spark + Arrow
Apache Spark + Arrow
Takeshi Yamamuro
Quick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + α
Takeshi Yamamuro
MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理
Takeshi Yamamuro
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Takeshi Yamamuro
LLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecode
Takeshi Yamamuro
An Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List Compression
Takeshi Yamamuro
Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題
Takeshi Yamamuro
20150513 legobease
20150513 legobease
Takeshi Yamamuro
20150516 icde2015 r19-4
20150516 icde2015 r19-4
Takeshi Yamamuro
VLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging Hardware
Takeshi Yamamuro
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
Takeshi Yamamuro
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
Takeshi Yamamuro
Introduction to Modern Analytical DB
Introduction to Modern Analytical DB
Takeshi Yamamuro
SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-
Takeshi Yamamuro
A x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequences
Takeshi Yamamuro
VAST-Tree, EDBT'12
VAST-Tree, EDBT'12
Takeshi Yamamuro
VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-
Takeshi Yamamuro
研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法
Takeshi Yamamuro
VLDB'10勉強会 -Session 20-
VLDB'10勉強会 -Session 20-
Takeshi Yamamuro
More from Takeshi Yamamuro
(20)
LT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature Expectation
Apache Spark + Arrow
Apache Spark + Arrow
Quick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + α
MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache Spark
LLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecode
An Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List Compression
Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題
20150513 legobease
20150513 legobease
20150516 icde2015 r19-4
20150516 icde2015 r19-4
VLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging Hardware
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
Introduction to Modern Analytical DB
Introduction to Modern Analytical DB
SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-
A x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequences
VAST-Tree, EDBT'12
VAST-Tree, EDBT'12
VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-
研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法
VLDB'10勉強会 -Session 20-
VLDB'10勉強会 -Session 20-
Recently uploaded
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
Kamal Acharya
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
ranjana rawat
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
rknatarajan
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
JiananWang21
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
ranjana rawat
Thermal Engineering Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
DineshKumar4165
Extrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
120cr0395
NFPA 5000 2024 standard .
NFPA 5000 2024 standard .
DerechoLaboralIndivi
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
SUHANI PANDEY
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
Asst.prof M.Gokilavani
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
roncy bisnoi
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
RagavanV2
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
roncy bisnoi
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
Call Girls in Nagpur High Profile Call Girls
Recently uploaded
(20)
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
Thermal Engineering Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
Extrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
NFPA 5000 2024 standard .
NFPA 5000 2024 standard .
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
20160908 hivemall meetup
1.
Copyright©2016 NTT corp.
All Rights Reserved. Hivemall Meets XGBoost in DataFrame/Spark 2016/9/8 Takeshi Yamamuro (maropu) @ NTT
2.
2Copyright©2016 NTT corp.
All Rights Reserved. Who am I?
3.
3Copyright©2016 NTT corp.
All Rights Reserved. • Short for eXtreme Gradient Boosting • https://github.com/dmlc/xgboost • It is... • variant of the gradient boosting machine • tree-‐‑‒based model • open-‐‑‒sourced tool (Apache2 license) • written in C++ • R/python/Julia/Java/Scala interfaces provided • widely used in Kaggle competitions is...
4.
4Copyright©2016 NTT corp.
All Rights Reserved. • Most of Hivemall functions supported in Spark-‐‑‒v1.6 and v2.0 • the v2.0 support not released yet • XGBoost integration under development • distributed/parallel predictions • native libraries bundled for major platforms • Mac /Linux on x86_̲64 • how-‐‑‒to-‐‑‒use: https://gist.github.com/maropu/ 33794b293ee937e99b8fb0788843fa3f Hivemall in DataFrame/Spark
5.
5Copyright©2016 NTT corp.
All Rights Reserved. Spark Quick Examples • Fetch a binary Spark v2.0.0 • http://spark.apache.org/downloads.html $ <SPARK_HOME>/bin/spark-shell scala> :paste val textFile = sc.textFile(”hoge.txt") val counts = textFile.flatMap(_.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)
6.
6Copyright©2016 NTT corp.
All Rights Reserved. Fetch training and test data • E2006 tfidf regression dataset • http://www.csie.ntu.edu.tw/~∼cjlin/libsvmtools/ datasets/regression.html#E2006-‐‑‒tfidf $ wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/ E2006.train.bz2 $ wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/ E2006.test.bz2
7.
7Copyright©2016 NTT corp.
All Rights Reserved. XGBoost in spark-‐‑‒shell • Scala interface bundled in the Hivemall jar $ bunzip2 E2006.train.bz2 $ <SPARK_HOME>/bin/spark-shell -conf spark.jars=hivemall-spark-XXX-with-dependencies.jar scala> import ml.dmlc.xgboost4j.scala._ scala> :paste // Read trainining data val trainData = new DMatrix(”E2006.train") // Define parameters val paramMap = List( "eta" -> 0.1, "max_depth" -> 2, "objective" -> ”reg:logistic” ).toMap // Train the model val model = XGBoost.train(trainData, paramMap, 2) // Save model to the file model.saveModel(”xgboost_models_dir/xgb_0001.model”)
8.
8Copyright©2016 NTT corp.
All Rights Reserved. Load test data in parallel $ <SPARK_HOME>/bin/spark-shell -conf spark.jars=hivemall-spark-XXX-with-dependencies.jar // Create DataFrame for the test data scala> val testDf = sqlContext.sparkSession.read.format("libsvm”) .load("E2006.test.bz2") scala> testDf.printSchema root |-- label: double (nullable = true) |-- features: vector (nullable = true)
9.
9Copyright©2016 NTT corp.
All Rights Reserved. Load test data in parallel 0.000357499151147113 6066:0.0007932706219604 8 6069:0.000311377727123504 6070:0.0003067549 34580457 6071:0.000276992485786437 6072:0.000 39663531098024 6074:0.00039663531098024 6075 :0.00032548335… testDf Partition1 Partition2 Partition3 PartitionN … … … Load in parallel because bzip2 is splittable • #partitions depends on three parameters • spark.default.parallelism: #cores by default • spark.sql.files.maxPartitionBytes: 128MB by default • spark.sql.files.openCostInBytes: 4MB by default
10.
10Copyright©2016 NTT corp.
All Rights Reserved. • XGBoost in DataFrame • Load built models and do cross-‐‑‒joins for predictions Do predictions in parallel scala> import org.apache.spark.hive.HivemallOps._ scala> :paste // Load built models from persistent storage val modelsDf = sqlContext.sparkSession.read.format(xgboost) .load(”xgboost_models_dir") // Do prediction in parallel via cross-joins val predict = modelsDf.join(testDf) .xgboost_predict($"rowid", $"features", $"model_id", $"pred_model") .groupBy("rowid") .avg()
11.
11Copyright©2016 NTT corp.
All Rights Reserved. • XGBoost in DataFrame • Load built models and do cross-‐‑‒joins for predictions • Broadcast cross-‐‑‒joins expected • Size of `̀modelsDf`̀ must be less than and equal to spark.sql.autoBroadcastJoinThreshold (10MB by default) Do predictions in parallel testDf rowid label features 1 0.392 1:0.3 5:0.1… 2 0.929 3:0.2… 3 0.132 2:0.9… 4 0.3923 5:0.4… … modelsDf model_̲id pred_̲model xgb_̲0001.model <binary data> xgb_̲0002.model <binary data> cross-joins in parallel
12.
12Copyright©2016 NTT corp.
All Rights Reserved. • Structured Streaming in Spark-‐‑‒2.0 • Scalable and fault-‐‑‒tolerant stream processing engine built on the Spark SQL engine • alpha component in v2.0 Do predictions for streaming data scala> :paste // Initialize streaming DataFrame val testStreamingDf = spark.readStream .format(”libsvm”) // Not supported in v2.0 … // Do prediction for streaming data val predict = modelsDf.join(testStreamingDf) .xgboost_predict($"rowid", $"features", $"model_id", $"pred_model") .groupBy("rowid") .avg()
13.
13Copyright©2016 NTT corp.
All Rights Reserved. • One model for a partition • WIP: Build models with different parameters Build models in parallel scala> :paste // Set options for XGBoost val xgbOptions = XGBoostOptions() .set("num_round", "10000") .set(“max_depth”, “32,48,64”) // Randomly selected by workers // Set # of models to output val numModels = 4 // Build models and save them in persistent storage trainDf.repartition(numModels) .train_xgboost_regr($“features”, $ “label”, s"${xgbOptions}") .write .format(xgboost) .save(”xgboost_models_dir”)
14.
14Copyright©2016 NTT corp.
All Rights Reserved. • If you get stuck in UnsatisfiedLinkError, you need to compile a binary by yourself Compile a binary on your platform $ mvn validate && mvn package -Pcompile-xgboost -Pspark-2.0 –DskipTests $ ls target hivemall-core-0.4.2-rc.2-with-dependencies.jar hivemall-spark-1.6.2_2.11.8-0.4.2-rc.2-with-dependencies.jar hivemall-core-0.4.2-rc.2.jar hivemall-spark-1.6.2_2.11.8-0.4.2-rc.2.jar hivemall-mixserv-0.4.2-rc.2-fat.jar hivemall-xgboost-0.4.2-rc.2.jar hivemall-nlp-0.4.2-rc.2-with-dependencies.jar hivemall-xgboost_0.60-0.4.2-rc.2-with-dependencies.jar hivemall-nlp-0.4.2-rc.2.jar hivemall-xgboost_0.60-0.4.2-rc.2.jar
15.
15Copyright©2016 NTT corp.
All Rights Reserved. • Rabbit integration for parallel learning • http://dmlc.cs.washington.edu/rabit.html • Python supports • spark.ml interface supports • Bundle more binaries for portability • Windows and x86 platforms • Others? Future Work
Download now