Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Livy:	A	REST	Web	Service	for	Spark
Pravin Mittal,	Microsoft
Anand	Iyer,	Cloudera
Reduce	friction	to	use	Spark
while
maintaining	all	its	power	and	flexibility
What	is	Livy?
A	Service	that	manages	long	running	Spark	Contexts	in	your	cluster
• Open	Source	Apache	Licensed
• REST	base...
What	is	Livy?
Livy
Server
Cluster	 (Managed	by	YARN,	Mesos,	 etc)
Driver ExecutorExecutor
Clients
Driver ExecutorExecutor
...
Spark	on	Azure	HDInsight
Fully	Managed	Service
• 100%	open	source	Apache	Spark	and	Hadoop	bits
• Latest	releases	of	Spark
...
Make	Spark	Simple	- Integrated	with	Azure	
Ecosystem
• Microsoft	R	Server	- Multi-threaded	math	libraries	and	transparent	...
Jupyter-Spark	Integration	via	Livy
• Sparkmagic is	an	open	source	library	that	Microsoft	is	incubating	under	the	Jupyter I...
Architectural	Advantages	of	Jupyter
integration	via	Livy
• Run	Spark	code	completely	remotely;	no	Spark	components	need	to...
HN0 HN1
YARN	RM
Livy	Server
Metadata
ID,	State,	...
Gateway
HDI	Cluster
WN0
Driver
WN1
Executor
WN2
Executor
Job
Status
HN0 HN1
YARN	RM
Livy	Server
Metadata
ID,	State,	
Unique	Tag,
Application	ID
Gateway
HDI	Cluster
WN0
Driver
WN1
Executor
WN...
DEMO
Resources
https://github.com/aggFTW/sparksummit2016
Livy	Architecture	Highlights
Cluster	 (Managed	by	YARN,	Mesos,	 etc)
Manage	multiple	independent	Spark	Contexts
Livy
Server
Executor
Driver ExecutorExe...
Cluster	 (Managed	by	YARN,	Mesos,	 etc)
User	Impersonation
Livy
Server
Driver ExecutorExecutor
HTTP
Context	1
Driver
Execu...
Livy	Client	API
Client	API:	Job	Interface
Client	API:	Submitting	a	job
Client	API	Architecture	
Livy
Server
Cluster	 (Managed	by	YARN,	Mesos,	 etc)
Driver ExecutorExecutor
Client
Application Co...
Community
http://livy.io
Upcoming SlideShare
Loading in …5
×

Livy: A REST Web Service For Apache Spark

6,934 views

Published on

Spark Summit 2016 talk by Anand Iyer (Cloudera) and
Pravin Mittal (Microsoft Corporation)

Published in: Data & Analytics

Livy: A REST Web Service For Apache Spark

  1. 1. Livy: A REST Web Service for Spark Pravin Mittal, Microsoft Anand Iyer, Cloudera
  2. 2. Reduce friction to use Spark while maintaining all its power and flexibility
  3. 3. What is Livy? A Service that manages long running Spark Contexts in your cluster • Open Source Apache Licensed • REST based interface • Lets you manage multiple Spark Contexts • Fine grained job submission • Retrieve job results over REST asynchronously or synchronously • Client APIs in java, scala and soon in python
  4. 4. What is Livy? Livy Server Cluster (Managed by YARN, Mesos, etc) Driver ExecutorExecutor Clients Driver ExecutorExecutor HTTP Context 1 Context 2 Context 2 Context 1
  5. 5. Spark on Azure HDInsight Fully Managed Service • 100% open source Apache Spark and Hadoop bits • Latest releases of Spark • Fully supported by Microsoft and Hortonworks • 99.9% Azure Cloud SLA; 24/7 Managed Service • Certifications: PCI, ISO 27018, SOC, HIPAA, EU-MC Optimized for experimentation and development • Jupyter Notebooks (scala, python, automatic data visualizations) • IntelliJ plugin (job submission, remote debugging) • ODBC connector for Power BI, Tableau, Qlik, SAP, Excel, etc
  6. 6. Make Spark Simple - Integrated with Azure Ecosystem • Microsoft R Server - Multi-threaded math libraries and transparent parallelization in R Server means handling up to 1000x more data and up to 50x faster speeds than open source R. This is based on open source R, it does require any change to R scripts • Azure Data Lake Store – HDFS for the cloud, optimized for massive throughput, Ultra-high capacity, Low Latency, Secure ACL support • Azure Data Factory orchestrates Spark ETL pipeline • PowerBI connector for Spark for rich visualization. New in Power BI is a streaming connector allowing you to publish real-time events from Spark Streaming directly to Power BI. • EventsHub connector as a data source for Spark streaming • Azure SQL Datawarehouse & Hbase connector for fast & scalable storage
  7. 7. Jupyter-Spark Integration via Livy • Sparkmagic is an open source library that Microsoft is incubating under the Jupyter Incubator program • Thousands of Spark clusters in production providing feedback to further improve the experience https://github.com/jupyter-incubator/sparkmagic
  8. 8. Architectural Advantages of Jupyter integration via Livy • Run Spark code completely remotely; no Spark components need to be installed on the Jupyter server • Multi-language support; the Python, Scala and R kernels are equally feature-rich • Support for multiple endpoints; you can use a single notebook to start multiple Spark jobs in different languages and against different remote clusters • Easy integration with any Python library for data science or visualization, like Pandas or Plotly
  9. 9. HN0 HN1 YARN RM Livy Server Metadata ID, State, ... Gateway HDI Cluster WN0 Driver WN1 Executor WN2 Executor Job Status
  10. 10. HN0 HN1 YARN RM Livy Server Metadata ID, State, Unique Tag, Application ID Gateway HDI Cluster WN0 Driver WN1 Executor WN2 Executor Job Status ZK0 ZK1 ZK2 Livy Metadata ID, Unique Tag, App ID
  11. 11. DEMO
  12. 12. Resources https://github.com/aggFTW/sparksummit2016
  13. 13. Livy Architecture Highlights
  14. 14. Cluster (Managed by YARN, Mesos, etc) Manage multiple independent Spark Contexts Livy Server Executor Driver ExecutorExecutorClient A HTTP Context 1 Context 1 Driver ExecutorExecutor Context 2 Driver ExecutorExecutor Context 3 Executor Executor Client B Context 2 Client C Context 2 Client D Context 3
  15. 15. Cluster (Managed by YARN, Mesos, etc) User Impersonation Livy Server Driver ExecutorExecutor HTTP Context 1 Driver ExecutorExecutor Context 2 Executor Client authenticates as user A Context 1 Client authenticates as user B Context 2 Running as user A. Only has access to files and resources that user A has access to. Running as user B. Only has access to files and resources that user B has access to.
  16. 16. Livy Client API
  17. 17. Client API: Job Interface
  18. 18. Client API: Submitting a job
  19. 19. Client API Architecture Livy Server Cluster (Managed by YARN, Mesos, etc) Driver ExecutorExecutor Client Application Context 1 Serialized Result Data Serialized Closure
  20. 20. Community http://livy.io

×