SlideShare a Scribd company logo
1 of 50
Download to read offline
Apache Zeppelin, Helium And Beyond
http://zeppelin.apache.org
Moon
Creator of Apache Zeppelin
Co-founder NFLabs
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2013. 10 2014. 08
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2014. 12 ASF incubation
Incubation Status http://incubator.apache.org/projects/zeppelin.html
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2014. 12 ASF incubation
2016. 08 145 Contributors world wide
1864 Stars on github repo
5 Releases
One of the most popular project in ASF
Collect
ETL /
Process
Analysis
Report
Data
Product
Life cycle of big data
Data
Engineer
Data
Scientist
Business user
Customer
Zeppelin
A web-based notebook that enables interactive data analytics. You can
make beautiful data-driven, interactive and collaborative documents with
SQL, Scala and more.
Zeppelin
JDBC
Markdown > _ Shell
Interpreter : pluggable layer for language / processing backend integration
20+ interpreters are supported officially
2016. 03. Interpreters in Zeppelin source tree. Does not include 3rd party interpreters
Zeppelin
Interpreter : pluggable layer for language / processing backend integration
Zeppelin
Interpreter : Easy to extend
public abstract class Interpreter {
public void open();
public void close();
public InterpreterResult interpret(String st, InterpreterContext context);
public void cancel(InterpreterContext context);
public int getProgress(InterpreterContext context);
public List<String> completion(String buf, int cursor);
public FormType getFormType();
public Scheduler getScheduler();
}
{Must have
{Good to have
Advanced {
Zeppelin
Shared mode : Single interpreter instance serves all notes, all users
Interpreter Process
Interpreter Group
Interpreter
Interpreter
…
Zeppelin
Scoped mode : Individual interpreter instance per each note
Interpreter Process
Interpreter Group
Interpreter
Interpreter
…
Interpreter Group
Interpreter
Interpreter
…
Interpreter Group
Interpreter
Interpreter
…
Zeppelin
Isolated mode : Individual interpreter process per each note
Interpreter Process
Interpreter Group
Interpreter
Interpreter
…
Interpreter Group
Interpreter
Interpreter
…
Interpreter Group
Interpreter
Interpreter
…
Interpreter ProcessInterpreter Process
Zeppelin
Spark Interpreter : Shared mode
Interpreter Group
Spark
SparkSQL
Pyspark
Interpreter Process
SparkR
SparkContext
Scala REPL
Zeppelin
Spark Interpreter : Scoped mode
Interpreter Group
Spark
SparkSQL
Pyspark
Interpreter Process
SparkR
SparkContext
Scala REPL
Interpreter Group
Spark
SparkSQL
Pyspark
SparkR
Scala REPL
Interpreter Group
Spark
SparkSQL
Pyspark
SparkR
Scala REPL
Zeppelin
Spark Interpreter : Isolated mode
Interpreter Group
Spark
SparkSQL
Pyspark
Interpreter Process
SparkR
SparkContext
Scala REPL
Interpreter Group
Spark
SparkSQL
Pyspark
Interpreter Process
SparkR
SparkContext
Scala REPL
Interpreter Group
Spark
SparkSQL
Pyspark
Interpreter Process
SparkR
SparkContext
Scala REPL
Zeppelin
Notebook Repo : pluggable layer for notebook persistence
5+ Notebook repos are supported officially
2016. 03. Notebook repos in Zeppelin source tree. Does not include 3rd party interpreters
ZeppelinHub
Zeppelin
Notebook Repo : Easy to extend
public interface NotebookRepo {
public List<NoteInfo> list() throws IOException;
public Note get(String noteId) throws IOException;
public void save(Note note) throws IOException;
public void remove(String noteId) throws IOException;
public void checkpoint(String noteId, String checkPointName) throws IOException;
public void close();
}
Zeppelin
Visualizations : 6 Built-in visualizations comes with pivot
Table Bar Pie Area Line Scatter
Free to draw any customized visualizations inside of notebook
…
He liumHe
2
Platform for data analytics application that
makes visualization pluggable and more.
http://issues.apache.org/jira/browse/ZEPPELIN-533
https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal
Proposal
Umbrella issue
Makes Zeppelin fly!
He liumHe
2
RESTful API Websocket
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Interpreters and Notebook storage are pluggable
He liumHe
2
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Visualizations
Map
WordCloud
…
We want visualization be pluggable
He liumHe
2
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC …
FileSystem
AmazonS3
Git
…
Application
Visualizations
Map
WordCloud
…
Resource Pool
SparkContext Flink Environment JDBC connection …
Analytics
…
…
User object
Extend pluggable visualization to pluggable analytics application
He liumHe
2
Helium application is interaction between view, algorithm and resources
= +
View Algorithm
Zeppelin provided Resources
Application
He liumHe
2
Zeppelin Server
Web browser
View
Interpreter Process
Algorithm
Resource pool
Resource pool
Resource
pools are
connected
Helium application runs where resource exists
Zeppelin
Interpreter Process
Interpreter Group
Interpreter
Interpreter
…
Zeppelin
Interpreter Process
Interpreter Group
Interpreter
Interpreter
…
Resource Pool
Key Value
Key Value
Key Value
AngularObjectRegistry
Key Value
Key Value
Key Value
Interpreter Output
Zeppelin
Interpreter Process
Interpreter Group
Interpreter
Interpreter
…
Resource Pool
SparkContext
JDBC Conn
User Object
AngularObjectRegistry
Key Value
Key Value
Key Value
Front-end
Interpreter Output
Zeppelin
Interpreter Process
Interpreter Group
Interpreter
Interpreter
…
Resource Pool
SparkContext
JDBC Conn
User Object
AngularObjectRegistry
Key Value
Key Value
Key Value
He
2
Front-end
Interpreter Output
Helium Application: Easy to extend
public abstract class Application {
public Application(ApplicationContext context);
public abstract void run(ResourceSet args);
public abstract void unload();
}
He liumHe
2
Launcher: Suggest application according to data type in ResourcePool
He liumHe
2
Description: Helium Application description file in Json format that Launcher reads
He liumHe
2
{
"type" : "APPLICATION",
"name" : "zeppelin.clock",
"description" : "Clock (example)",
"artifact" : “org.apache.zeppelin:zeppelin-example-clock:0.7.0”,
"className" : "org.apache.zeppelin.example.app.clock.Clock",
"resources" : [[":java.util.Date"]],
"icon" : '<i class="fa fa-clock-o"></i>'
}
Demo
He liumHe
2
Interpreter Notebook Storage
Application
Resource Pool
SparkContext Flink Environment JDBC connection …User object
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
Map
WordCloud
…Maven
Download and load on the fly
Online repository for pluggable modules
He liumHe
2
Helium
Registry zeppelin-packages my company + Add
XX
VisualizationWordcloud
Make your table output to word cloud
Install
R Interpreter
R is a free software environment for statistical computing and graphics. It compiles and
runs on a wide variety of UNIX platforms, Windows and MacOS
Install
ZeppelinHub Notebook Storage
Save your notebook in ZeppelinHub.
You can access control and share your notebook online
Install
Registry for pluggable modules
He liumHe
2
Future work
High level API for Visualization
Online helium registry
Zeppelin
Enterprise Ready
Multi-tenancy, ZEPPELIN-1337
Fault tolerance
Job management
Job dependency
Event hook and notification
Table data processing engine
Table data abstraction
Light weight table data processing layer
Roadmap
Improved Python, R support
better matplotlib integration
auto completion
UI / UX improvement
Add yours!
z
Zeppelin and Friends
Z-Manager
ZeppelinHub
Collaboration/Sharing
Enterprise Package
Zeppelin + Full stack on a cloud
We want to grow community around Zeppelin, more!
Zeppelin
Homepage
http://zeppelin.apache.org/

Mailing list
users@zeppelin.apache.org
dev@zeppelin.apache.org
Issue tracker
https://issues.apache.org/jira/browse/ZEPPELIN
Github repository
http://github.com/apache/zeppelin
Join the community
Thank you
Moon soo Lee
moon@nflabs.com
moon@apache.org
https://twitter.com/issuefreaks
Zeppelin Server
Interpreter
User
Target Data processing engine
Code Result
Zeppelin Server
Interpreter
User
Target Data processing engine
Code Result
Table data processing engine
Pivot, etc
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond

More Related Content

What's hot

Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Sparkfelixcss
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Edureka!
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisDataWorks Summit/Hadoop Summit
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Anya Bida
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkSpark Summit
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierDatabricks
 
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...Databricks
 
Spark Summit EU talk by William Benton
Spark Summit EU talk by William BentonSpark Summit EU talk by William Benton
Spark Summit EU talk by William BentonSpark Summit
 
Parallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkRParallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkRDatabricks
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Databricks
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedinYukti Kaura
 
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSimplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesDatabricks
 
Project Zen: Improving Apache Spark for Python Users
Project Zen: Improving Apache Spark for Python UsersProject Zen: Improving Apache Spark for Python Users
Project Zen: Improving Apache Spark for Python UsersDatabricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideWhizlabs
 

What's hot (20)

Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub Hava
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis Gkoufas
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
 
Spark Summit EU talk by William Benton
Spark Summit EU talk by William BentonSpark Summit EU talk by William Benton
Spark Summit EU talk by William Benton
 
Parallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkRParallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkR
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
 
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSimplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
 
Project Zen: Improving Apache Spark for Python Users
Project Zen: Improving Apache Spark for Python UsersProject Zen: Improving Apache Spark for Python Users
Project Zen: Improving Apache Spark for Python Users
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 

Similar to Apache Zeppelin, Helium and Beyond

Apache Zeppelin and Helium @ApacheCon 2017 may, FL
Apache Zeppelin and Helium  @ApacheCon 2017 may, FLApache Zeppelin and Helium  @ApacheCon 2017 may, FL
Apache Zeppelin and Helium @ApacheCon 2017 may, FLAhyoung Ryu
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonApache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonDataWorks Summit
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your EyesDemi Ben-Ari
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedOmid Vahdaty
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Simplilearn
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_onSri Ambati
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellDatabricks
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Simplilearn
 
Not Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabsNot Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabsKonrad Malawski
 
Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016Konrad Malawski
 
End to End Akka Streams / Reactive Streams - from Business to Socket
End to End Akka Streams / Reactive Streams - from Business to SocketEnd to End Akka Streams / Reactive Streams - from Business to Socket
End to End Akka Streams / Reactive Streams - from Business to SocketKonrad Malawski
 
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitterApache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitterApache Zeppelin
 

Similar to Apache Zeppelin, Helium and Beyond (20)

Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
Apache Zeppelin and Helium @ApacheCon 2017 may, FL
Apache Zeppelin and Helium  @ApacheCon 2017 may, FLApache Zeppelin and Helium  @ApacheCon 2017 may, FL
Apache Zeppelin and Helium @ApacheCon 2017 may, FL
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonApache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
Not Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabsNot Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabs
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
LOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack ArchitectureLOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack Architecture
 
Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016
 
End to End Akka Streams / Reactive Streams - from Business to Socket
End to End Akka Streams / Reactive Streams - from Business to SocketEnd to End Akka Streams / Reactive Streams - from Business to Socket
End to End Akka Streams / Reactive Streams - from Business to Socket
 
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitterApache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Apache Zeppelin, Helium and Beyond

  • 1. Apache Zeppelin, Helium And Beyond http://zeppelin.apache.org
  • 2. Moon Creator of Apache Zeppelin Co-founder NFLabs
  • 3. Zeppelin 2012. 12 Data analytics solution based on AMP Lab Spark/Shark
  • 4. Zeppelin 2012. 12 Data analytics solution based on AMP Lab Spark/Shark 2013. 10 Opensource interactive analytics feature as ‘Zeppelin’ 2013. 10 2014. 08
  • 5. Zeppelin 2012. 12 Data analytics solution based on AMP Lab Spark/Shark 2013. 10 Opensource interactive analytics feature as ‘Zeppelin’ 2014. 12 ASF incubation Incubation Status http://incubator.apache.org/projects/zeppelin.html
  • 6. Zeppelin 2012. 12 Data analytics solution based on AMP Lab Spark/Shark 2013. 10 Opensource interactive analytics feature as ‘Zeppelin’ 2014. 12 ASF incubation 2016. 08 145 Contributors world wide 1864 Stars on github repo 5 Releases One of the most popular project in ASF
  • 7. Collect ETL / Process Analysis Report Data Product Life cycle of big data Data Engineer Data Scientist Business user Customer
  • 8. Zeppelin A web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.
  • 9. Zeppelin JDBC Markdown > _ Shell Interpreter : pluggable layer for language / processing backend integration 20+ interpreters are supported officially 2016. 03. Interpreters in Zeppelin source tree. Does not include 3rd party interpreters
  • 10. Zeppelin Interpreter : pluggable layer for language / processing backend integration
  • 11. Zeppelin Interpreter : Easy to extend public abstract class Interpreter { public void open(); public void close(); public InterpreterResult interpret(String st, InterpreterContext context); public void cancel(InterpreterContext context); public int getProgress(InterpreterContext context); public List<String> completion(String buf, int cursor); public FormType getFormType(); public Scheduler getScheduler(); } {Must have {Good to have Advanced {
  • 12. Zeppelin Shared mode : Single interpreter instance serves all notes, all users Interpreter Process Interpreter Group Interpreter Interpreter …
  • 13. Zeppelin Scoped mode : Individual interpreter instance per each note Interpreter Process Interpreter Group Interpreter Interpreter … Interpreter Group Interpreter Interpreter … Interpreter Group Interpreter Interpreter …
  • 14. Zeppelin Isolated mode : Individual interpreter process per each note Interpreter Process Interpreter Group Interpreter Interpreter … Interpreter Group Interpreter Interpreter … Interpreter Group Interpreter Interpreter … Interpreter ProcessInterpreter Process
  • 15. Zeppelin Spark Interpreter : Shared mode Interpreter Group Spark SparkSQL Pyspark Interpreter Process SparkR SparkContext Scala REPL
  • 16. Zeppelin Spark Interpreter : Scoped mode Interpreter Group Spark SparkSQL Pyspark Interpreter Process SparkR SparkContext Scala REPL Interpreter Group Spark SparkSQL Pyspark SparkR Scala REPL Interpreter Group Spark SparkSQL Pyspark SparkR Scala REPL
  • 17. Zeppelin Spark Interpreter : Isolated mode Interpreter Group Spark SparkSQL Pyspark Interpreter Process SparkR SparkContext Scala REPL Interpreter Group Spark SparkSQL Pyspark Interpreter Process SparkR SparkContext Scala REPL Interpreter Group Spark SparkSQL Pyspark Interpreter Process SparkR SparkContext Scala REPL
  • 18. Zeppelin Notebook Repo : pluggable layer for notebook persistence 5+ Notebook repos are supported officially 2016. 03. Notebook repos in Zeppelin source tree. Does not include 3rd party interpreters ZeppelinHub
  • 19. Zeppelin Notebook Repo : Easy to extend public interface NotebookRepo { public List<NoteInfo> list() throws IOException; public Note get(String noteId) throws IOException; public void save(Note note) throws IOException; public void remove(String noteId) throws IOException; public void checkpoint(String noteId, String checkPointName) throws IOException; public void close(); }
  • 20. Zeppelin Visualizations : 6 Built-in visualizations comes with pivot Table Bar Pie Area Line Scatter Free to draw any customized visualizations inside of notebook …
  • 21. He liumHe 2 Platform for data analytics application that makes visualization pluggable and more. http://issues.apache.org/jira/browse/ZEPPELIN-533 https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal Proposal Umbrella issue Makes Zeppelin fly!
  • 22. He liumHe 2 RESTful API Websocket Interpreter Notebook Storage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … ZeppelinServer Interpreters and Notebook storage are pluggable
  • 23. He liumHe 2 Interpreter Notebook Storage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … ZeppelinServer Visualizations Map WordCloud … We want visualization be pluggable
  • 24. He liumHe 2 Interpreter Notebook Storage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … Application Visualizations Map WordCloud … Resource Pool SparkContext Flink Environment JDBC connection … Analytics … … User object Extend pluggable visualization to pluggable analytics application
  • 25. He liumHe 2 Helium application is interaction between view, algorithm and resources = + View Algorithm Zeppelin provided Resources Application
  • 26. He liumHe 2 Zeppelin Server Web browser View Interpreter Process Algorithm Resource pool Resource pool Resource pools are connected Helium application runs where resource exists
  • 28. Zeppelin Interpreter Process Interpreter Group Interpreter Interpreter … Resource Pool Key Value Key Value Key Value AngularObjectRegistry Key Value Key Value Key Value Interpreter Output
  • 29. Zeppelin Interpreter Process Interpreter Group Interpreter Interpreter … Resource Pool SparkContext JDBC Conn User Object AngularObjectRegistry Key Value Key Value Key Value Front-end Interpreter Output
  • 30. Zeppelin Interpreter Process Interpreter Group Interpreter Interpreter … Resource Pool SparkContext JDBC Conn User Object AngularObjectRegistry Key Value Key Value Key Value He 2 Front-end Interpreter Output
  • 31. Helium Application: Easy to extend public abstract class Application { public Application(ApplicationContext context); public abstract void run(ResourceSet args); public abstract void unload(); } He liumHe 2
  • 32. Launcher: Suggest application according to data type in ResourcePool He liumHe 2
  • 33. Description: Helium Application description file in Json format that Launcher reads He liumHe 2 { "type" : "APPLICATION", "name" : "zeppelin.clock", "description" : "Clock (example)", "artifact" : “org.apache.zeppelin:zeppelin-example-clock:0.7.0”, "className" : "org.apache.zeppelin.example.app.clock.Clock", "resources" : [[":java.util.Date"]], "icon" : '<i class="fa fa-clock-o"></i>' }
  • 34. Demo
  • 35. He liumHe 2 Interpreter Notebook Storage Application Resource Pool SparkContext Flink Environment JDBC connection …User object Spark Flink Geode JDBC … FileSystem AmazonS3 Git … Map WordCloud …Maven Download and load on the fly Online repository for pluggable modules
  • 36. He liumHe 2 Helium Registry zeppelin-packages my company + Add XX VisualizationWordcloud Make your table output to word cloud Install R Interpreter R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS Install ZeppelinHub Notebook Storage Save your notebook in ZeppelinHub. You can access control and share your notebook online Install Registry for pluggable modules
  • 37. He liumHe 2 Future work High level API for Visualization Online helium registry
  • 38. Zeppelin Enterprise Ready Multi-tenancy, ZEPPELIN-1337 Fault tolerance Job management Job dependency Event hook and notification Table data processing engine Table data abstraction Light weight table data processing layer Roadmap Improved Python, R support better matplotlib integration auto completion UI / UX improvement Add yours!
  • 39. z Zeppelin and Friends Z-Manager ZeppelinHub Collaboration/Sharing Enterprise Package Zeppelin + Full stack on a cloud We want to grow community around Zeppelin, more!
  • 41. Thank you Moon soo Lee moon@nflabs.com moon@apache.org https://twitter.com/issuefreaks
  • 42.
  • 43. Zeppelin Server Interpreter User Target Data processing engine Code Result Zeppelin Server Interpreter User Target Data processing engine Code Result Table data processing engine Pivot, etc