4. Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2013. 10 2014. 08
5. Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2014. 12 ASF incubation
Incubation Status http://incubator.apache.org/projects/zeppelin.html
6. Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2014. 12 ASF incubation
2016. 08 145 Contributors world wide
1864 Stars on github repo
5 Releases
One of the most popular project in ASF
8. Zeppelin
A web-based notebook that enables interactive data analytics. You can
make beautiful data-driven, interactive and collaborative documents with
SQL, Scala and more.
9. Zeppelin
JDBC
Markdown > _ Shell
Interpreter : pluggable layer for language / processing backend integration
20+ interpreters are supported officially
2016. 03. Interpreters in Zeppelin source tree. Does not include 3rd party interpreters
11. Zeppelin
Interpreter : Easy to extend
public abstract class Interpreter {
public void open();
public void close();
public InterpreterResult interpret(String st, InterpreterContext context);
public void cancel(InterpreterContext context);
public int getProgress(InterpreterContext context);
public List<String> completion(String buf, int cursor);
public FormType getFormType();
public Scheduler getScheduler();
}
{Must have
{Good to have
Advanced {
12. Zeppelin
Shared mode : Single interpreter instance serves all notes, all users
Interpreter Process
Interpreter Group
Interpreter
Interpreter
…
13. Zeppelin
Scoped mode : Individual interpreter instance per each note
Interpreter Process
Interpreter Group
Interpreter
Interpreter
…
Interpreter Group
Interpreter
Interpreter
…
Interpreter Group
Interpreter
Interpreter
…
14. Zeppelin
Isolated mode : Individual interpreter process per each note
Interpreter Process
Interpreter Group
Interpreter
Interpreter
…
Interpreter Group
Interpreter
Interpreter
…
Interpreter Group
Interpreter
Interpreter
…
Interpreter ProcessInterpreter Process
15. Zeppelin
Spark Interpreter : Shared mode
Interpreter Group
Spark
SparkSQL
Pyspark
Interpreter Process
SparkR
SparkContext
Scala REPL
16. Zeppelin
Spark Interpreter : Scoped mode
Interpreter Group
Spark
SparkSQL
Pyspark
Interpreter Process
SparkR
SparkContext
Scala REPL
Interpreter Group
Spark
SparkSQL
Pyspark
SparkR
Scala REPL
Interpreter Group
Spark
SparkSQL
Pyspark
SparkR
Scala REPL
17. Zeppelin
Spark Interpreter : Isolated mode
Interpreter Group
Spark
SparkSQL
Pyspark
Interpreter Process
SparkR
SparkContext
Scala REPL
Interpreter Group
Spark
SparkSQL
Pyspark
Interpreter Process
SparkR
SparkContext
Scala REPL
Interpreter Group
Spark
SparkSQL
Pyspark
Interpreter Process
SparkR
SparkContext
Scala REPL
18. Zeppelin
Notebook Repo : pluggable layer for notebook persistence
5+ Notebook repos are supported officially
2016. 03. Notebook repos in Zeppelin source tree. Does not include 3rd party interpreters
ZeppelinHub
19. Zeppelin
Notebook Repo : Easy to extend
public interface NotebookRepo {
public List<NoteInfo> list() throws IOException;
public Note get(String noteId) throws IOException;
public void save(Note note) throws IOException;
public void remove(String noteId) throws IOException;
public void checkpoint(String noteId, String checkPointName) throws IOException;
public void close();
}
20. Zeppelin
Visualizations : 6 Built-in visualizations comes with pivot
Table Bar Pie Area Line Scatter
Free to draw any customized visualizations inside of notebook
…
21. He liumHe
2
Platform for data analytics application that
makes visualization pluggable and more.
http://issues.apache.org/jira/browse/ZEPPELIN-533
https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal
Proposal
Umbrella issue
Makes Zeppelin fly!
22. He liumHe
2
RESTful API Websocket
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Interpreters and Notebook storage are pluggable
23. He liumHe
2
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Visualizations
Map
WordCloud
…
We want visualization be pluggable
25. He liumHe
2
Helium application is interaction between view, algorithm and resources
= +
View Algorithm
Zeppelin provided Resources
Application
26. He liumHe
2
Zeppelin Server
Web browser
View
Interpreter Process
Algorithm
Resource pool
Resource pool
Resource
pools are
connected
Helium application runs where resource exists
31. Helium Application: Easy to extend
public abstract class Application {
public Application(ApplicationContext context);
public abstract void run(ResourceSet args);
public abstract void unload();
}
He liumHe
2
35. He liumHe
2
Interpreter Notebook Storage
Application
Resource Pool
SparkContext Flink Environment JDBC connection …User object
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
Map
WordCloud
…Maven
Download and load on the fly
Online repository for pluggable modules
36. He liumHe
2
Helium
Registry zeppelin-packages my company + Add
XX
VisualizationWordcloud
Make your table output to word cloud
Install
R Interpreter
R is a free software environment for statistical computing and graphics. It compiles and
runs on a wide variety of UNIX platforms, Windows and MacOS
Install
ZeppelinHub Notebook Storage
Save your notebook in ZeppelinHub.
You can access control and share your notebook online
Install
Registry for pluggable modules
41. Thank you
Moon soo Lee
moon@nflabs.com
moon@apache.org
https://twitter.com/issuefreaks
42.
43. Zeppelin Server
Interpreter
User
Target Data processing engine
Code Result
Zeppelin Server
Interpreter
User
Target Data processing engine
Code Result
Table data processing engine
Pivot, etc