Spark Summit EU talk by Oscar Castaneda

•

4 likes•1,128 views

Spark Summit

Spark Cluster with Elasticsearch Inside

Data & Analytics

About
• Researcher at Universidad del Valle de Guatemala.
• Research Interests:
• Program Transformation,
• Programming Education Research,
• Online Learning to Rank.

Spark cluster with …
http://bit.ly/2em6RUK

Spark cluster with Elasticsearch
http://bit.ly/2em6RUKhttp://bit.ly/2ebM9HO

Spark cluster with Elasticsearch
http://bit.ly/2em6RUK

Inside!
Spark cluster with Elasticsearch

Agenda
• Problem Statement and Motivation.
• Read/Write (internal) ES Server.
• Create ES Server inside Spark Cluster.
• Snapshot/Restore ES indices using S3.
• Demo: IndexTweetsLive on Spark with Elastic inside.
• Q&A

Problem Statement
• During development with ES-Hadoop it
is cumbersome to have Elasticsearch
running outside a Spark cluster.

Architecture
Restore ES snapshot
Read CSV files
Take ES
snapshot
Restore ES snapshot
http://bit.ly/2e5H1jL

Architecture
Restore ES snapshot
Read CSV files
Take ES
snapshot
Restore ES snapshot
Dev Ops
http://bit.ly/2e5H1jL

Motivation
• Control Elasticsearch instance during development.
• Reduce dependencies between teams during development.
• Use ES snapshots as interface between teams.
• Increase QA efficiency.

Native Integration
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
...
val conf = ...
val sc = new SparkContext(conf)
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html#spark-write
saveToEs("spark/docs")
Write data to Elasticsearch

Why not run
Elasticsearch inside
Spark Cluster? *
* At least for development purposes.

How do you run
Elasticsearch inside
Spark Cluster?

Imports
http://bit.ly/2efaib4
http://bit.ly/2di0cFq
http://bit.ly/2ebM9HO

Write to Local ES
saveToEs("tweets/hashtags")

Check results on local ES
GET
getUrlAsString(“http://10.104.239.70:9200/_cat/indicies?v”)

What have we seen?
• How to Read/Write (internal) ES Server.
• How to create ES Server inside Spark Cluster.
• How to Snapshot/Restore ES indices using S3.
• Demo: IndexTweetsLive on Spark with Elastic inside.

Next Steps
• Spark 2.0
• Continuous Applications
• Elasticsearch 5.0

THANK YOU.
Email: ofcastaneda@uvg.edu.gt
Twitter: @oscar_castaneda

What's hot

Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit

Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit

Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit

Spark Summit EU talk by Jim DowlingSpark Summit

Spark Summit EU talk by John MusserSpark Summit

Spark Summit EU talk by Bas GeerdinkSpark Summit

Spark Summit EU talk by Patrick Baier and Stanimir DragievSpark Summit

Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraDatabricks

Whirlpools in the Stream with Jayesh LalwaniDatabricks

Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng ShiDatabricks

Spark Summit EU talk by Stephan KesslerSpark Summit

Spark Summit EU talk by Dean WamplerSpark Summit

Spark Uber Development KitJen Aman

Simplifying Big Data Applications with Apache Spark 2.0Spark Summit

Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks

Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit

Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Databricks

Spark Summit EU talk by Michael NitschingerSpark Summit

Spark Summit EU talk by Yaroslav Nedashkovsky and Andy StarzhinskySpark Summit

Building a Data Pipeline from Scratch - Joe CrobakHakka Labs

What's hot (20)

Spark Summit EU talk by Shay Nativ and Dvir Volk

Spark Summit EU talk by Kent Buenaventura and Willaim Lau

Spark Summit EU talk by Kaarthik Sivashanmugam

Spark Summit EU talk by Jim Dowling

Spark Summit EU talk by John Musser

Spark Summit EU talk by Bas Geerdink

Spark Summit EU talk by Patrick Baier and Stanimir Dragiev

Stream All Things—Patterns of Modern Data Integration with Gwen Shapira

Whirlpools in the Stream with Jayesh Lalwani

Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi

Spark Summit EU talk by Stephan Kessler

Spark Summit EU talk by Dean Wampler

Spark Uber Development Kit

Simplifying Big Data Applications with Apache Spark 2.0

Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...

Spark Summit EU talk by Debasish Das and Pramod Narasimha

Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0

Spark Summit EU talk by Michael Nitschinger

Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky

Building a Data Pipeline from Scratch - Joe Crobak

Viewers also liked

Spark Summit EU talk by Sital KediaSpark Summit

Spark Summit EU talk by Sameer AgarwalSpark Summit

Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit

Spark Summit EU talk by Ahsan Javed AwanSpark Summit

Spark Summit EU talk by Elena LazovikSpark Summit

Spark Summit EU talk by Larisa SawyerSpark Summit

Spark Summit EU talk by Nimbus GoehausenSpark Summit

Spark Summit EU talk by Ross LawleySpark Summit

Spark Summit EU talk by Luca CanaliSpark Summit

Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit

Spark Summit EU talk by Pat PattersonSpark Summit

Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit

Spark Summit EU talk by Tim HunterSpark Summit

Spark Summit EU talk by Ted MalaskaSpark Summit

Spark Summit EU talk by Casey StellaSpark Summit

Spark Summit EU talk by Sebastian Schroeder and Ralf SigmundSpark Summit

The Next AMPLab: Real-Time, Intelligent, and Secure ComputingSpark Summit

Spark Summit EU talk by Bas GeerdinkSpark Summit

Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit

Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit

Viewers also liked (20)

Spark Summit EU talk by Sital Kedia

Spark Summit EU talk by Sameer Agarwal

Spark Summit EU talk by Kaarthik Sivashanmugam

Spark Summit EU talk by Ahsan Javed Awan

Spark Summit EU talk by Elena Lazovik

Spark Summit EU talk by Larisa Sawyer

Spark Summit EU talk by Nimbus Goehausen

Spark Summit EU talk by Ross Lawley

Spark Summit EU talk by Luca Canali

Spark Summit EU talk by Miha Pelko and Til Piffl

Spark Summit EU talk by Pat Patterson

Spark Summit EU talk by Ruben Pulido and Behar Veliqi

Spark Summit EU talk by Tim Hunter

Spark Summit EU talk by Ted Malaska

Spark Summit EU talk by Casey Stella

Spark Summit EU talk by Sebastian Schroeder and Ralf Sigmund

The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Spark Summit EU talk by Bas Geerdink

Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma

Spark Summit EU talk by Miklos Christine paddling up the stream

Similar to Spark Summit EU talk by Oscar Castaneda

In-Memory Evolution in Apache SparkKazuaki Ishizaki

Introduction to Apache Spark and MLlibpumaranikar

Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit

Tech Spark PresentationStephen Borg

Getting started with SparkSQL - Desert Code Camp 2016clairvoyantllc

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit

Scala at Treasure DataTaro L. Saito

Deep learning and Apache SparkQuantUniversity

Spark tutorialSahan Bulathwela

In-memory No SQL- GIDS2014Hazelcast

Jump Start on Apache® Spark™ 2.x with Databricks Databricks

Jumpstart on Apache Spark 2.2 on DatabricksDatabricks

Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks

Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Spark Summit

Best Apache Kafka Training in Bangalore. Join myTectramyTectra Learning Solutions Private Ltd

Incorta spark integrationDylan Wan

Scalding by Adform Research, Alex GryzlovVasil Remeniuk

End-to-End Data Pipelines with Apache SparkBurak Yavuz

Getting Started with Apache Spark on KubernetesDatabricks

Developing and deploying big data machine learning modelsNarayana Swamy

Similar to Spark Summit EU talk by Oscar Castaneda (20)

In-Memory Evolution in Apache Spark

Introduction to Apache Spark and MLlib

Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...

Tech Spark Presentation

Getting started with SparkSQL - Desert Code Camp 2016

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...

Scala at Treasure Data

Deep learning and Apache Spark

Spark tutorial

In-memory No SQL- GIDS2014

Jump Start on Apache® Spark™ 2.x with Databricks

Jumpstart on Apache Spark 2.2 on Databricks

Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3

Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...

Best Apache Kafka Training in Bangalore. Join myTectra

Incorta spark integration

Scalding by Adform Research, Alex Gryzlov

End-to-End Data Pipelines with Apache Spark

Getting Started with Apache Spark on Kubernetes

Developing and deploying big data machine learning models

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Data-Analysis for Chicago Crime Data 2023ymrp368

Ukraine War presentation: KNOW THE BASICSAishani27

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

Industrialised data - the key to AI success.pdfLars Albertsson

Recently uploaded (20)

Log Analysis using OSSEC sasoasasasas.pptx

Data-Analysis for Chicago Crime Data 2023

Ukraine War presentation: KNOW THE BASICS

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

RA-11058_IRR-COMPRESS Do 198 series of 1998

Sampling (random) method and Non random.ppt

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Schema on read is obsolete. Welcome metaprogramming..pdf

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

FESE Capital Markets Fact Sheet 2024 Q1.pdf

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

CebaBaby dropshipping via API with DroFX.pptx

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

BabyOno dropshipping via API with DroFx.pptx

Brighton SEO | April 2024 | Data Storytelling

Industrialised data - the key to AI success.pdf

Spark Summit EU talk by Oscar Castaneda

1. Spark Cluster with Elasticsearch Inside Oscar Castañeda-Villagrán Universidad del Valle de Guatemala

2. About • Researcher at Universidad del Valle de Guatemala. • Research Interests: • Program Transformation, • Programming Education Research, • Online Learning to Rank.

3. Spark cluster … http://bit.ly/2em6RUK

4. Spark cluster with … http://bit.ly/2em6RUK

5. Spark cluster with Elasticsearch http://bit.ly/2em6RUKhttp://bit.ly/2ebM9HO

6. Spark cluster with Elasticsearch http://bit.ly/2em6RUK

7. Inside! Spark cluster with Elasticsearch

8. Agenda • Problem Statement and Motivation. • Read/Write (internal) ES Server. • Create ES Server inside Spark Cluster. • Snapshot/Restore ES indices using S3. • Demo: IndexTweetsLive on Spark with Elastic inside. • Q&A

9. Problem Statement • During development with ES-Hadoop it is cumbersome to have Elasticsearch running outside a Spark cluster.

10. Architecture Restore ES snapshot Read CSV files Take ES snapshot Restore ES snapshot http://bit.ly/2e5H1jL

11. Architecture Restore ES snapshot Read CSV files Take ES snapshot Restore ES snapshot Dev Ops http://bit.ly/2e5H1jL

12. Motivation • Control Elasticsearch instance during development. • Reduce dependencies between teams during development. • Use ES snapshots as interface between teams. • Increase QA efficiency.

13. Native Integration import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.elasticsearch.spark._ ... val conf = ... val sc = new SparkContext(conf) val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3) val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran") sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs") https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html#spark-write saveToEs("spark/docs") Write data to Elasticsearch

14. Native Integration import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.elasticsearch.spark._ ... val conf = ... val sc = new SparkContext(conf) val RDD = sc.esRDD("radio/artists") Read data from Elasticsearch sc.esRDD("radio/artists") https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html#spark-read

15. But where do you run Elasticsearch?

16. Why not run Elasticsearch inside Spark Cluster? * * At least for development purposes.

17. How do you run Elasticsearch inside Spark Cluster?

18. Imports http://bit.ly/2efaib4 http://bit.ly/2di0cFq http://bit.ly/2ebM9HO

19. Setup Local ES server.start()

20. Write to Local ES saveToEs("tweets/hashtags")

21. Check results on local ES GET getUrlAsString(“http://10.104.239.70:9200/_cat/indicies?v”)

22. Snapshot to S3

23. Restore from S3

24. Demo!

25. What have we seen? • How to Read/Write (internal) ES Server. • How to create ES Server inside Spark Cluster. • How to Snapshot/Restore ES indices using S3. • Demo: IndexTweetsLive on Spark with Elastic inside.

26. Next Steps • Spark 2.0 • Continuous Applications • Elasticsearch 5.0

27. Q&A

28. THANK YOU. Email: ofcastaneda@uvg.edu.gt Twitter: @oscar_castaneda

Spark Summit EU talk by Oscar Castaneda

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Spark Summit EU talk by Oscar Castaneda

Similar to Spark Summit EU talk by Oscar Castaneda (20)

More from Spark Summit

More from Spark Summit (20)

Recently uploaded

Recently uploaded (20)

Spark Summit EU talk by Oscar Castaneda