Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spark Summit EU talk by Oscar Castaneda

758 views

Published on

Spark Cluster with Elasticsearch Inside

Published in: Data & Analytics
  • The final result was amazing, and I highly recommend ⇒ www.HelpWriting.net ⇐ to anyone in the same mindset as me.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Spark Summit EU talk by Oscar Castaneda

  1. 1. Spark Cluster with Elasticsearch Inside Oscar Castañeda-Villagrán Universidad del Valle de Guatemala
  2. 2. About • Researcher at Universidad del Valle de Guatemala. • Research Interests: • Program Transformation, • Programming Education Research, • Online Learning to Rank.
  3. 3. Spark cluster … http://bit.ly/2em6RUK
  4. 4. Spark cluster with … http://bit.ly/2em6RUK
  5. 5. Spark cluster with Elasticsearch http://bit.ly/2em6RUKhttp://bit.ly/2ebM9HO
  6. 6. Spark cluster with Elasticsearch http://bit.ly/2em6RUK
  7. 7. Inside! Spark cluster with Elasticsearch
  8. 8. Agenda • Problem Statement and Motivation. • Read/Write (internal) ES Server. • Create ES Server inside Spark Cluster. • Snapshot/Restore ES indices using S3. • Demo: IndexTweetsLive on Spark with Elastic inside. • Q&A
  9. 9. Problem Statement • During development with ES-Hadoop it is cumbersome to have Elasticsearch running outside a Spark cluster.
  10. 10. Architecture Restore ES snapshot Read CSV files Take ES snapshot Restore ES snapshot http://bit.ly/2e5H1jL
  11. 11. Architecture Restore ES snapshot Read CSV files Take ES snapshot Restore ES snapshot Dev Ops http://bit.ly/2e5H1jL
  12. 12. Motivation • Control Elasticsearch instance during development. • Reduce dependencies between teams during development. • Use ES snapshots as interface between teams. • Increase QA efficiency.
  13. 13. Native Integration import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.elasticsearch.spark._ ... val conf = ... val sc = new SparkContext(conf) val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3) val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran") sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs") https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html#spark-write saveToEs("spark/docs") Write data to Elasticsearch
  14. 14. Native Integration import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.elasticsearch.spark._ ... val conf = ... val sc = new SparkContext(conf) val RDD = sc.esRDD("radio/artists") Read data from Elasticsearch sc.esRDD("radio/artists") https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html#spark-read
  15. 15. But where do you run Elasticsearch?
  16. 16. Why not run Elasticsearch inside Spark Cluster? * * At least for development purposes.
  17. 17. How do you run Elasticsearch inside Spark Cluster?
  18. 18. Imports http://bit.ly/2efaib4 http://bit.ly/2di0cFq http://bit.ly/2ebM9HO
  19. 19. Setup Local ES server.start()
  20. 20. Write to Local ES saveToEs("tweets/hashtags")
  21. 21. Check results on local ES GET getUrlAsString(“http://10.104.239.70:9200/_cat/indicies?v”)
  22. 22. Snapshot to S3
  23. 23. Restore from S3
  24. 24. Demo!
  25. 25. What have we seen? • How to Read/Write (internal) ES Server. • How to create ES Server inside Spark Cluster. • How to Snapshot/Restore ES indices using S3. • Demo: IndexTweetsLive on Spark with Elastic inside.
  26. 26. Next Steps • Spark 2.0 • Continuous Applications • Elasticsearch 5.0
  27. 27. Q&A
  28. 28. THANK YOU. Email: ofcastaneda@uvg.edu.gt Twitter: @oscar_castaneda

×