Submit Search
Upload
Emerging trends in data analytics
•
8 likes
•
1,229 views
Wei-Chiu Chuang
Follow
Keynote presentation at DataCon.TW 2019.
Read less
Read more
Software
Report
Share
Report
Share
1 of 48
Download now
Download to read offline
Recommended
巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture
Wei-Chiu Chuang
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
How to deploy machine learning models into production
How to deploy machine learning models into production
DataWorks Summit
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Timothy Spann
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
Jan Pieter Posthuma
Keep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its Best
DataWorks Summit/Hadoop Summit
Big Data Fundamentals
Big Data Fundamentals
Cloudera, Inc.
Recommended
巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture
Wei-Chiu Chuang
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
How to deploy machine learning models into production
How to deploy machine learning models into production
DataWorks Summit
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Timothy Spann
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
Jan Pieter Posthuma
Keep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its Best
DataWorks Summit/Hadoop Summit
Big Data Fundamentals
Big Data Fundamentals
Cloudera, Inc.
Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data Security
DataWorks Summit
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
DataWorks Summit/Hadoop Summit
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
DataWorks Summit
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Timothy Spann
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
Cisco DevNet
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
Grant Henke
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
DataWorks Summit
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera
Cloudera, Inc.
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
DataWorks Summit
Hadoop Everywhere
Hadoop Everywhere
DataWorks Summit/Hadoop Summit
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
DataWorks Summit
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
More Related Content
What's hot
Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data Security
DataWorks Summit
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
DataWorks Summit/Hadoop Summit
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
DataWorks Summit
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Timothy Spann
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
Cisco DevNet
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
Grant Henke
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
DataWorks Summit
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera
Cloudera, Inc.
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
DataWorks Summit
Hadoop Everywhere
Hadoop Everywhere
DataWorks Summit/Hadoop Summit
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
DataWorks Summit
What's hot
(20)
Apache Hadoop 3
Apache Hadoop 3
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data Security
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Hadoop Everywhere
Hadoop Everywhere
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
Similar to Emerging trends in data analytics
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
Timothy Spann
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann
Part 2: A Visual Dive into Machine Learning and Deep Learning
Part 2: A Visual Dive into Machine Learning and Deep Learning
Cloudera, Inc.
AIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AI
Timothy Spann
Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10
Timothy Spann
Building Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
Cloudera Japan
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
Kafka for DBAs
Kafka for DBAs
Gwen (Chen) Shapira
Stl meetup cloudera platform - january 2020
Stl meetup cloudera platform - january 2020
Adam Doyle
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
One Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
Cloudera, Inc.
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Timothy Spann
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
Meetup: Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Uri Laserson
Similar to Emerging trends in data analytics
(20)
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Part 2: A Visual Dive into Machine Learning and Deep Learning
Part 2: A Visual Dive into Machine Learning and Deep Learning
AIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AI
Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10
Building Real-Time Travel Alerts
Building Real-Time Travel Alerts
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Kafka for DBAs
Kafka for DBAs
Stl meetup cloudera platform - january 2020
Stl meetup cloudera platform - january 2020
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Spark One Platform Webinar
Spark One Platform Webinar
Meetup: Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Recently uploaded
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Cizo Technology Services
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
umasea
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
31events.com
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
andrehoraa
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Mater
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
preethippts
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
smiwainfosol
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
Dinusha Kumarasiri
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
Philip Schwarz
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
qr0udbr0
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
rcbcrtm
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
Diego Iván Oliveros Acosta
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
OnePlan Solutions
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Matt Ray
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
Wave PLM
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Natan Silnitsky
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
BrainSell Technologies
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
vaddepallysandeep122
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
VICTOR MAESTRE RAMIREZ
Recently uploaded
(20)
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
Emerging trends in data analytics
1.
EMERGING TRENDS IN
DATA ANALYTICS Wei-Chiu Chuang, Ph.D. HDFS Lead Engineer, Cloudera | Apache Hadoop PMC/Committer
2.
© 2019 Cloudera,
Inc. All rights reserved. 2 Hadoop HDFS Lead Engineer Committer/PMC Founding member
3.
© 2019 Cloudera,
Inc. All rights reserved. 3 ”BIG DATA” IS PASSÉ https://trends.google.com/trends/explore?date=today%205-y&q=Big%20Data,Data%20Analytics
4.
LANDSCAPE OF COMMERCIAL
OPEN SOURCE DATA ANALYTICS SOFTWARE
5.
© 2019 Cloudera,
Inc. All rights reserved. 5 A YEAR OF TECTONIC SHIFT Merged Acquired by Acquired by
6.
© 2019 Cloudera,
Inc. All rights reserved. 6 OPEN SOURCE DATA ANALYTICS SOFTWARE UNICORNS
7.
© 2019 Cloudera,
Inc. All rights reserved. 7 SUB-BILLION, UNICORNS-TO-BE 3.7 5 4 Lucidworks neo4j H2O.AI Valuation Valuation ($100 M)
8.
DATA ENGINEER COMMUNITY
9.
© 2019 Cloudera,
Inc. All rights reserved. 9 IS HADOOP DEAD?
10.
© 2019 Cloudera,
Inc. All rights reserved. 10 MOST ACTIVE VISITS AND DOWNLOADS • Hadoop web pages are the most popular among all Apache projects. Apache Software Foundation Annual Report 2019
11.
© 2019 Cloudera,
Inc. All rights reserved. 11 MAPREDUCE IS DEAD; LONG LIVE HDFS AND YARN • Stack Overflow Trends • HDFS and YARN are mature
12.
© 2019 Cloudera,
Inc. All rights reserved. 12 • Hadoop is the on-prem platform for Big Data Analytics. • Like Linux. Boring, but it’s the foundation.
13.
© 2019 Cloudera,
Inc. All rights reserved. 13 COMPUTE ENGINES IN HADOOP- ECOSYSTEM • Stack Overflow Trends • Spark most popular • Hive stable • MapReduce, Pig and Storm are dead. https://insights.stackoverflow.com/trends?tags=apache- spark%2Chadoop%2Chive%2Cmapreduce%2Capache-pig%2Capache- storm&fbclid=IwAR1InUJDJPPoUDDfhYtiiaxo21RVqKO2y-SwDQ2_AT4kpIvBn1NjuLYhdGc
14.
© 2019 Cloudera,
Inc. All rights reserved. 14 BIG DATA: YOUR NAME IS SQL • Stack Overflow Trends • Hive was the most popular until 2018. • SparkSQL grew fastest until 2018. • Cloud: BigQuery is more popular than Redshift
15.
© 2019 Cloudera,
Inc. All rights reserved. 16 BATCH VS REAL-TIME
16.
© 2019 Cloudera,
Inc. All rights reserved. 17 KAFKA • Message broker >> stream processing
17.
© 2019 Cloudera,
Inc. All rights reserved. 18 STREAM PROCESSING • Stack Overflow Trends ⎯ (exclude Kafka Streams) • Flink grows fastest; Beam too • Spark Streaming declining • Storm is dead https://insights.stackoverflow.com/trends?tags=spark-streaming%2Capache-flink%2Capache- storm%2Capache-beam&fbclid=IwAR1InUJDJPPoUDDfhYtiiaxo21RVqKO2y- SwDQ2_AT4kpIvBn1NjuLYhdGc
18.
© 2019 Cloudera,
Inc. All rights reserved. 20 SPARK • Stack Overflow Trends • Spark no longer the cool kid • You will write in PySpark or SparkSQL. • Spark Streaming is declining • Very little people develop ML with Spark. ⎯ Wait for Spark 3.0? • batch >> streaming https://insights.stackoverflow.com/trends?tags=apache-spark%2Cpyspark%2Cspark- streaming%2Cspark-dataframe%2Cpyspark-sql%2Capache-spark-sql%2Capache-spark- mllib&fbclid=IwAR1InUJDJPPoUDDfhYtiiaxo21RVqKO2y-SwDQ2_AT4kpIvBn1NjuLYhdGc
19.
© 2019 Cloudera,
Inc. All rights reserved. 21 LANGUAGE • Python > Java > C++ > Go https://insights.stackoverflow.com/trends?tags=python%2Cgo%2Cjava%2Cc%2B%2B
20.
© 2019 Cloudera,
Inc. All rights reserved. 22 TRENDY TECHNOLOGIES • Stack Overflow Trends https://insights.stackoverflow.com/trends?tags=tensorflow%2Ckubernetes%2Capache- spark%2Capache-kafka%2Cdocker&fbclid=IwAR1InUJDJPPoUDDfhYtiiaxo21RVqKO2y- SwDQ2_AT4kpIvBn1NjuLYhdGc
21.
© 2019 Cloudera,
Inc. All rights reserved. 23 SUMMARY • Deep learning (Tensorflow) • Micro services (Kubernetes, Docker, Kafka) • Batch > Streaming, but Streaming is gaining traction • Python
22.
DEVELOPER COMMUNITY
23.
© 2019 Cloudera,
Inc. All rights reserved. 25 APACHE SOFTWARE FOUNDATION 20 years anniversary 300+ projects 48 incubating projects FY2019: 187 k commits 3215 committers
24.
© 2019 Cloudera,
Inc. All rights reserved. 26 CNCF • 6 graduated projects • 16 incubating projects • 18 sandbox projects • Last year ⎯ 141 k commits last year ⎯ 7647 committers
25.
© 2019 Cloudera,
Inc. All rights reserved. 27 APACHE VS CNCF • Big Data, Database, Cloud • Contributors are individuals • DevOps tools • Contributors are associated with the companies
26.
© 2019 Cloudera,
Inc. All rights reserved. 28 IMPACT OF CLDR-HWX MERGER • 63% Hadoop commits are made by Cloudera employees in 2019. • Community development • Bad news: Apache Ambari, Apache Sentry • Good news: Hive support for Kudu, Ranger support for Impala
27.
© 2019 Cloudera,
Inc. All rights reserved. 29 DEVELOPER COMMUNITY MOVING TO ASIA • HBase ⎯ HBaseCon Asia • Hadoop ⎯ 1st ever Hadoop Meetup in China • China is the third largest contributor to CNCF projects. ⎯ 3 projects were born in China Apache github visits
28.
© 2019 Cloudera,
Inc. All rights reserved. 30 MACHINE LEARNING ● Apache Hadoop Submarine ● Submarine Project. ● Distributed machine learning platform ● algorithm development, model batch training, model incremental training, model online services and model management ⎯ Available since: 3.2.0 (As part of YARN) ⎯ Become top level subproject: 0.2.0 (Separate release) ⎯ Lots of new stuff in 0.3.0. https://hadoop.apache.org/submarine/
29.
MEASURING THE HEALTH
OF DEVELOPER COMMUNITY
30.
© 2019 Cloudera,
Inc. All rights reserved. 32 CREATED/RESOLVED JIRAS
31.
© 2019 Cloudera,
Inc. All rights reserved. 33 NUMBER OF CONTRIBUTORS https://www.openhub.net/p/apache-spark https://www.openhub.net/p/mongodb Apache Spark MongoDB
32.
© 2019 Cloudera,
Inc. All rights reserved. 35 DIVERSITY (AFFILIATION) OF KUBERNETES DEVELOPERShttps://k8s.devstats.cncf.io/d/8/company-statistics-by-repository-group?orgId=1&var-period=m&var- metric=prs&var-repogroup_name=All&var-companies=All
33.
© 2019 Cloudera,
Inc. All rights reserved. 37 APACHE APEX (2016 - 2018) The story of an abandoned project • “Enterprise-grade unified stream and batch processing engine.” • Founded in April, 2016. • Backed by DataTorrent, collapsed May, 2018 after raising $23.9 million.
34.
© 2019 Cloudera,
Inc. All rights reserved. 38 APACHE APEX • https://reporter.apache.org/wizar d/statistics?apex • Last new PMC was on 2018-05- 15. • Last new committer was 2017- 10-19. • Community Health Score (Chi): -4.28 (Action required!)
35.
WILL CLOUD KILL
OPEN SOURCE?
36.
© 2019 Cloudera,
Inc. All rights reserved. 40 OPEN SOURCE VS. PROPRIETARY • Popularity of open source data systems is about to take over proprietary systems for the first time. • Why open source? ⎯ Free ⎯ Innovation ⎯ Industry standard Source: https://db-engines.com/en/ranking_osvsc
37.
© 2019 Cloudera,
Inc. All rights reserved. 41 AMAZON’ED: CLOUD PROVIDERS THREATENING OSS VENDORS Redis Labs AGPL Redis Source Available License MongoDB AGPL Server Side Public License Confluent Apache 2.0 Confluent Community License Cockroach LabsApache 2.0 Business Source License
38.
© 2019 Cloudera,
Inc. All rights reserved. 42 ALL IS NOT LOST Cloudera will be 100% open source • Hadoop, Spark, Kafka, ... Apache Software License 2.0 • Cloudera Manager • Cloudera Data Science Workbench • Cloud service Proprietary à AGPL
39.
© 2019 Cloudera,
Inc. All rights reserved. 43 CLOUD VENDORS’ OSS STRATEGY Amazon AWS EMR, Open Distro for Elasticsearch, DocumentDB, Amazon MSK Microsoft Azure Partnership: HDInsight, Azure Databricks, Azure Red Hat Openshift Google GCP Partnership: Confluent, DataStax, Elastic, InfluxData, MongoDB, Neo4j, Redis Labs
40.
CLOUD NATIVE, CLOUD
FIRST
41.
© 2019 Cloudera,
Inc. All rights reserved. 46 KUBERNETES IS THE NEW OPERATING SYSTEM KubeCon attendance
42.
YuniKorn
43.
FUTURE: HYBRID CLOUD
44.
© 2019 Cloudera,
Inc. All rights reserved. 63 HYBRID CLOUD IS THE NEW NORM • Public cloud deployments will capture most of growth. • On-prem deployments will still exist, for niche use cases. ⎯ Regulation (FinServ, Healthcare) ⎯ High density (>100 TB per node) ⎯ Specialized hardware (100Gbps NIC, GPU, FPGA, NVMe, Vector Engine)
45.
TAKEAWAY
46.
© 2019 Cloudera,
Inc. All rights reserved. 65 TAKEAWAY Big Data à Data Analytics Commercial open source software market is booming Don’t bet on a single open source software Cloud vendors will find a balance with OSS vendors Hybrid cloud
47.
WE ARE HIRING! https://www.cloudera.com/careers/teams/engineering.html (remote
positions available)
48.
THANK YOU Scan me!
Download now