Submit Search
Upload
#HSTokyo16 Apache Spark Crash Course
•
12 likes
•
2,078 views
DataWorks Summit/Hadoop Summit
Follow
#HSTokyo16 Apache Spark Crash Course
Read less
Read more
Technology
Report
Share
Report
Share
1 of 67
Download now
Download to read offline
Recommended
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
DataWorks Summit/Hadoop Summit
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
Recommended
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
DataWorks Summit/Hadoop Summit
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
Log Analytics Optimization
Log Analytics Optimization
Hortonworks
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
Hortonworks
Apache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
Apache deep learning 101
Apache deep learning 101
DataWorks Summit
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
DataWorks Summit/Hadoop Summit
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
DataWorks Summit/Hadoop Summit
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
DataWorks Summit/Hadoop Summit
More Related Content
What's hot
Log Analytics Optimization
Log Analytics Optimization
Hortonworks
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
Hortonworks
Apache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
Apache deep learning 101
Apache deep learning 101
DataWorks Summit
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
What's hot
(19)
Log Analytics Optimization
Log Analytics Optimization
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
Apache Hadoop Crash Course
Apache Hadoop Crash Course
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
Apache deep learning 101
Apache deep learning 101
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Viewers also liked
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
DataWorks Summit/Hadoop Summit
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
DataWorks Summit/Hadoop Summit
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
DataWorks Summit/Hadoop Summit
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
DataWorks Summit/Hadoop Summit
Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.
DataWorks Summit/Hadoop Summit
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
DataWorks Summit/Hadoop Summit
The real world use of Big Data to change business
The real world use of Big Data to change business
DataWorks Summit/Hadoop Summit
The truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on Hadoop
DataWorks Summit/Hadoop Summit
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
DataWorks Summit/Hadoop Summit
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
DataWorks Summit/Hadoop Summit
SEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile games
DataWorks Summit/Hadoop Summit
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
DataWorks Summit/Hadoop Summit
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
Viewers also liked
(20)
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
The real world use of Big Data to change business
The real world use of Big Data to change business
The truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on Hadoop
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
SEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile games
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
Similar to #HSTokyo16 Apache Spark Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWX
Kirk Haslbeck
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
Abdelkrim Hadjidj
Log Analytics Optimization
Log Analytics Optimization
Isheeta Sanghi
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
Hortonworks
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
Unlocking insights in streaming data
Unlocking insights in streaming data
Carolyn Duby
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
Sean Roberts
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Milind Pandit
Apache Metron in the Real World
Apache Metron in the Real World
DataWorks Summit
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Mats Johansson
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
Time-series data analysis and persistence with Druid
Time-series data analysis and persistence with Druid
Raúl Marín
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
Hortonworks
Similar to #HSTokyo16 Apache Spark Crash Course
(20)
Apache Spark Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWX
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
Log Analytics Optimization
Log Analytics Optimization
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
Unlocking insights in streaming data
Unlocking insights in streaming data
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Apache Metron in the Real World
Apache Metron in the Real World
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Time-series data analysis and persistence with Druid
Time-series data analysis and persistence with Druid
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Recently uploaded
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Lonnie McRorey
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
RankYa
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
Lars Bell
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
charlottematthew16
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
DianaGray10
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Recently uploaded
(20)
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
#HSTokyo16 Apache Spark Crash Course
1.
Robert Hryniewicz Data Advocate Twitter: @RobH8z Email: rhryniewicz@hortonworks.com Apache Spark Crash Course Hadoop Summit Tokyo 2016
2.
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda •
Background • Spark Overview • Zeppelin Overview • Components of HDP • Lab ~ 45min
3.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Sources Ã
Internet of Anything (IoAT) – Wind Turbines, Oil Rigs, Cars – Weather Stations, Smart Grids – RFID Tags, Beacons, Wearables à User Generated Content (Web & Mobile) – Twitter, Facebook, Snapchat, YouTube – Clickstream, Ads, User Engagement – Payments: Paypal, Venmo 44ZB in 2020
4.
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The “Big Data” Problem Ã
A single machine cannot process or even store all the data! Problem Solution à Distribute data over large clusters Difficulty à How to split work across machines? à Moving data over network is expensive à Must consider data & network locality à How to deal with failures? à How to deal with slow nodes?
5.
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark Background
6.
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved History of Hadoop
& Spark
7.
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Access Rates At least an order of magnitude difference between memory and hard drive / network speed FAST
slower slowest
8.
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What Is Apache Spark? Ã
Apache open source project originally developed at AMPLab (University of California Berkeley) Ã Unified data processing engine that operates across varied data workloads and platforms
9.
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Apache Spark? Ã
Elegant Developer APIs – Single environment for data munging, data wrangling, and Machine Learning (ML) Ã In-memory computation model – Fast! – Effective for iterative computations and ML Ã Machine Learning – Implementation of distributed ML algorithms – Pipeline API (Spark ML)
10.
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark Ecosystem Spark Core Spark SQL
Spark Streaming Spark MLlib GraphX
11.
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Spark Basics
12.
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark Context Ã
Main entry point for Spark functionality à Represents a connection to a Spark cluster à Represented as sc in your code (in Zeppelin) What is it?
13.
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark SQL
14.
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark SQL Overview Ã
Spark module for structured data processing (e.g. DB tables, JSON files, CSV) Ã Three ways to manipulate data: – DataFrames API – SQL queries – Datasets API
15.
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DataFrames Ã
Distributed collection of data organized into named columns à Conceptually equivalent to a table in relational DB or a data frame in R/Python à API available in Scala, Java, Python, and R Col1 Col2 … … ColN DataFrame Column Row Data is described as a DataFrame with rows, columns, and a schema
16.
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DataFrames CSVAvro HIVE Spark SQL Text Col1
Col2 … … ColN DataFrame Column Row Created from Various Sources à DataFrames from HIVE: – Reading and writing HIVE tables à DataFrames from files: – Built-in: JSON, JDBC, ORC, Parquet, HDFS – External plug-in: CSV, HBASE, Avro JSON
17.
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SQL Context Ã
Entry point into all functionality in Spark SQL à All you need is SparkContext val sqlContext = SQLContext(sc) SQLContext à Superset of functionality provided by basic SQLContext – Read data from Hive tables – Access to Hive Functions à UDFs HiveContext val hc = HiveContext(sc) Use when your data resides in Hive
18.
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark SQL Examples
19.
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Setting up DataFrame
API val flightsDF = … ç Create from CSV, JSON, Hive etc. Example: val path = "examples/flights.json" val flightsDF = sqlContext.read.json(path) Create a DataFrame
20.
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Setting up SQL API Register a Temporary Table flightsDF.registerTempTable("flights")
21.
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Two API Examples: DataFrame
and SQL APIs flightsDF.select("Origin", "Dest", "DepDelay”) .filter($"DepDelay" > 15).show(5) Results +------+----+--------+ |Origin|Dest|DepDelay| +------+----+--------+ | IAD| TPA| 19| | IND| BWI| 34| | IND| JAX| 25| | IND| LAS| 67| | IND| MCO| 94| +------+----+--------+ SELECT Origin, Dest, DepDelay FROM flights WHERE DepDelay > 15 LIMIT 5 SQL API DataFrame API
22.
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark Streaming
23.
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Stream Processing? Batch Processing •
Ability to process and analyze data at-rest (stored data) • Request-based, bulk evaluation and short-lived processing • Enabler for Retrospective, Reactive and On-demand Analytics Stream Processing • Ability to ingest, process and analyze data in-motion in real- or near-real-time • Event or micro-batch driven, continuous evaluation and long-lived processing • Enabler for real-time Prospective, Proactive and Predictive Analytics for Next Best Action Stream Processing + Batch Processing = All Data Analytics real-time (now) historical (past)
24.
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Next
Generation Analytics Iterative & Exploratory Data is the structure Traditional Analytics Structured & Repeatable Structure built to store data 24 Modern Data Applications approach to Insights Start with hypothesis Test against selected data Data leads the way Explore all data, identify correlations Analyze after landing… Analyze in motion…
25.
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark Streaming Ã
Extension of Spark Core API Ã Stream processing of live data streams – Scalable – High-throughput – Fault-tolerant Overview ZeroMQ MQTT
26.
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark Streaming
27.
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark Streaming Discretized Streams (DStreams) Ã
High-level abstraction representing continuous stream of data à Internally represented as a sequence of RDDs à Operation applied on a DStream translates to operations on the underlying RDDs
28.
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark Streaming Example: flatMap
operation
29.
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark Streaming Ã
Apply transformations over a sliding window of data, e.g. rolling average Window Operations
30.
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark MLlib
31.
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Where
Can We Use Machine Learning (Data Science) Healthcare • Predict diagnosis • Prioritize screenings • Reduce re-admittance rates Financial services • Fraud Detection/prevention • Predict underwriting risk • New account risk screens Public Sector • Analyze public sentiment • Optimize resource allocation • Law enforcement & security Retail • Product recommendation • Inventory management • Price optimization Telco/mobile • Predict customer churn • Predict equipment failure • Customer behavior analysis Oil & Gas • Predictive maintenance • Seismic data management • Predict well production levels
32.
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scatter
2D Data Visualized scatterData ç DataFrame +-----+--------+ |label|features| +-----+--------+ |-12.0| [-4.9]| | -6.0| [-4.5]| | -7.2| [-4.1]| | -5.0| [-3.2]| | -2.0| [-3.0]| | -3.1| [-2.1]| | -4.0| [-1.5]| | -2.2| [-1.2]| | -2.0| [-0.7]| | 1.0| [-0.5]| | -0.7| [-0.2]| ... ... ...
33.
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Linear
Regression Model Training (one feature) Coefficients: 2.81 Intercept: 3.05 y = 2.81x + 3.05 Training Result
34.
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Linear
Regression (two features) Coefficients: [0.464, 0.464] Intercept: 0.0563
35.
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark
API for building ML pipelines Feature transform 1 Feature transform 2 Combine features Linear Regression Input DataFrame Input DataFrame Output DataFrame Pipeline Pipeline Model Train Predict Export Model
36.
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark GraphX
37.
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved GraphX Ã
Page Rank à Topic Modeling (LDA) à Community Detection Source: ampcamp.berkeley.edu
38.
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin & HDP Sandbox
39.
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What’s
Apache Zeppelin? Web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more
40.
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What
is a Note/Notebook? • A web based GUI for small code snippets • Write code snippets in browser • Zeppelin sends code to backend for execution • Zeppelin gets data back from backend • Zeppelin visualizes data • Zeppelin Note = Set of (Paragraphs/Cells) • Other Features - Sharing/Collaboration/Reports/Import/Export
41.
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Big Data Lifecycle Collect ETL / Process Analysis Report Data Product Business user Customer Data ScientistData Engineer All in one place in Zeppelin!
42.
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does Zeppelin work? Notebook Author Collaborators/ Report viewers Zeppelin Cluster Spark | Hive | HBase Any of 30+ back ends
43.
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP Sandbox What’s included in the HDP Sandbox? Ã
Zeppelin à Spark à YARN à Resource Management à HDFS à Distributed Storage Layer à And many more components: Hive, Solr etc. YARN Scala Java Python R APIs Spark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS
44.
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Access
patterns enabled by YARN YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °N HDFS Hadoop Distributed File System Interactive Real-TimeBatch Applications Batch Needs to happen but, no timeframe limitations Interactive Needs to happen at Human time Real-Time Needs to happen at Machine Execution time.
45.
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Apache Spark on YARN? Ã
Resource management – Share Spark workloads with other workloads (HIVE, Solr, etc.) à Utilizes existing HDP cluster infrastructure à Scheduling and queues Spark Driver Client Spark Application Master YARN container Spark Executor YARN container Task Task Spark Executor YARN container Task Task Spark Executor YARN container Task Task
46.
46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why
HDFS? Fault Tolerant Distributed Storage • Divide files into big blocks and distribute 3 copies randomly across the cluster • Processing Data Locality • Not Just storage but computation 10110100101 00100111001 11111001010 01110100101 00101100100 10101001100 01010010111 01011101011 11011011010 10110100101 01001010101 01011100100 11010111010 0 Logical File 1 2 3 4 Blocks 1 Cluster 1 1 2 2 2 3 3 34 4 4
47.
47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved There’s
more to HDP YARN : Data Operating System DATA ACCESS SECURITY GOVERNANCE & INTEGRATION OPERATIONS 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Data Lifecycle & Governance Falcon Atlas Administration Authentication Authorization Auditing Data Protection Ranger Knox Atlas HDFS EncryptionData Workflow Sqoop Flume Kafka NFS WebHDFS Provisioning, Managing, & Monitoring Ambari Cloudbreak Zookeeper Scheduling Oozie Batch MapReduce Script Pig Search Solr SQL Hive NoSQL HBase Accumulo Phoenix Stream Storm In-memory Others ISV Engines Tez Tez Slider Slider DATA MANAGEMENT Hortonworks Data Platform 2.4.x Deployment ChoiceLinux Windows On-Premise Cloud HDFS Hadoop Distributed File System
48.
48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks Data Cloud
49.
49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
50.
50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
51.
51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Bringing Multitenancy to Apache Zeppelin
52.
52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Introducing Livy Ã
Livy is the open source REST interface for interacting with Apache Spark from anywhere à Installed as Spark Ambari Service Livy Client HTTP HTTP (RPC) Spark Interactive Session SparkContext Spark Batch Session SparkContext Livy Server
53.
53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security Across Zeppelin-Livy-Spark Shiro Ispark Group Interpreter SPNego: Kerberos
Kerberos Livy APIs Spark on YARN Zeppelin Driver LDAP Livy Server
54.
54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reasons to Integrate with Livy Ã
Bring Sessions to Apache Zeppelin – Isolation – Session sharing à Enable efficient cluster resource utilization – Default Spark interpreter keeps YARN/Spark job running forever – Livy interpreter recycled after 60 minutes of inactivity (controlled by livy.server.session.timeout ) à To Identity Propagation – Send user identity from Zeppelin > Livy > Spark on YARN
55.
55 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Livy
Server SparkContext Sharing Session-2 Session-1 SparkSession-1 SparkContext SparkSession-2 SparkContext Client 1 Client 2 Client 3 Session-1 Session-1 Session-2
56.
56 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sample Architecture
57.
57 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Managed Dataflow SOURCES REGIONAL INFRASTRUCTURE CORE INFRASTRUCTURE
58.
58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved High-Level Overview IoT
Edge (single node) IoT Edge (single node) IoT Devices IoT Devices NiFi Hub Data Broker Column DB Data Store Live Dashboard Data Center (on prem/cloud) HDFS/S3 HBase/Cassandra
59.
59 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What’s new in Spark 2.0
60.
60 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark 2.0 Ã
API Improvements – SparkSession (spark) – new entry point (Replaces SQLContext and HiveContext) – Unified DataFrame & DataSet API (DataFrame à alias for DataSet[Row]) – Structured Streaming/Continuous Application (Concept of an infinite DataFrame) – Temporary Table à Temporary View à Performance Improvements – Tungsten Phase 2 - Multi stage code gen – ORC & Parquet file improvements à Machine Learning – ML pipeline the new API, MLlib deprecated – Distributed R algorithms (GLM, Naïve Bayes, K-Means, Survival Regression) à SparkSQL – More SQL support (new ANSI SQL parser, subquery support)
61.
61 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What’s the latest at Hortonworks? Ã
HDP 2.5 – Batch Processing à HDF 2.0 – Streaming Apps DATA AT REST DATA IN MOTION ACTIONABLE INTELLIGENCE Modern Data Applications
62.
62 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lab Preview
63.
63 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lab Setup Instructions http://tinyurl.com/hwx-spark-intro Lab Options -
Local Sandbox (8GB RAM memory required): - VirtualBox or Vmware - Amazon AWS Cloud: - Hortonworks Data Cloud è Setup info: http://hortonworks.github.io/hdp-aws/index.html http://hortonworks.github.io/hdp-aws/index.html http://hortonworks.github.io/hdp-aws/index.html
64.
64 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks Community Connection
65.
65 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Community Engagement Participate
now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved 9,500+ Registered Users 21,000+ Answers 32,500+ Technical Assets One Website!
66.
66 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks Community Connection Read
access for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
67.
Robert Hryniewicz E: rhryniewicz@hortonworks.com T: @RobH8z Thanks!
Download now