SlideShare a Scribd company logo
1 of 26
Download to read offline
Adiós Hadoop
Hola Spark!
1	
  
@dhiguero
dhiguero@stratio.com
Daniel Higuero
•  Introducción
•  Spark
§  Conceptos básicos
§  Ecosistema
Agenda
2	
  
3	
  
VIEWER DISCRETION IS ADVISED
All	
  elephants	
  are	
  innocent	
  un3l	
  proven	
  guilty	
  in	
  a	
  
court	
  of	
  development	
  
Opinions	
  expressed	
  are	
  solely	
  my	
  own	
  and	
  do	
  not	
  express	
  the	
  views	
  or	
  opinions	
  of	
  my	
  employer.	
  
Introducción
4	
  
Timeline
#t3chfest2015 5	
  
2002	
   2003	
   2004	
   2005	
   2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
   2015	
  
Google	
  
MapReduce	
  
paper	
  
Google	
  
GFS	
  paper	
  
Timeline
#t3chfest2015 6	
  
2002	
   2003	
   2004	
   2005	
   2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
   2015	
  
Google	
  
MapReduce	
  
paper	
  
Google	
  
GFS	
  paper	
   Hive	
  
HBase	
  
Hadoop	
  1TB,	
  
910	
  nodes	
  <	
  4	
  
min	
  
Timeline
#t3chfest2015 7	
  
2002	
   2003	
   2004	
   2005	
   2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
   2015	
  
Google	
  
MapReduce	
  
paper	
  
Google	
  
GFS	
  paper	
   Hive	
  
HBase	
  
Hadoop	
  1TB,	
  
910	
  nodes	
  <	
  4	
  
min	
  
alpha-­‐0.1	
  
Spark	
  0.7	
  
Timeline
#t3chfest2015 8	
  
2002	
   2003	
   2004	
   2005	
   2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
   2015	
  
Google	
  
MapReduce	
  
paper	
  
Google	
  
GFS	
  paper	
   Hive	
  
HBase	
  
Hadoop	
  1TB,	
  
910	
  nodes	
  <	
  4	
  
min	
  
Hadoop	
  103	
  TB,	
  
2100	
  nodes,	
  72	
  
min	
  
alpha-­‐0.1	
  
Spark	
  0.7	
  
Timeline
#t3chfest2015 9	
  
2002	
   2003	
   2004	
   2005	
   2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
   2015	
  
Google	
  
MapReduce	
  
paper	
  
Google	
  
GFS	
  paper	
   Hive	
  
HBase	
  
Hadoop	
  1TB,	
  
910	
  nodes	
  <	
  4	
  
min	
  
Spark	
  100	
  TB,	
  
206	
  nodes,	
  23	
  
min	
  
Hadoop	
  103	
  TB,	
  
2100	
  nodes,	
  72	
  
min	
  
alpha-­‐0.1	
  
Spark	
  0.7	
   Spark	
  1.2+	
  
o  ¿Qué es Spark?
o  Framework de procesamiento paralelo
o  Historia
Introducción
10	
  
https://spark.apache.org/
Apache	
  SoOware	
  Founda3on	
  
#t3chfest2015
o  Concepto de programación funcional
o  Popularizado por Google
Map-reduce
11	
  
(map	
  'list	
  (lambda	
  (x)	
  (+	
  x	
  10))	
  '(1	
  2	
  3	
  4))	
  
	
   	
   	
   	
   	
   	
   	
  	
  =>	
  (11	
  12	
  13	
  14)	
  
(reduce	
  #'+	
  '(1	
  2	
  3	
  4))	
  =>	
  10	
  
Jeff	
  Dean	
  and	
  Sanjay	
  Ghemawat.	
  "MapReduce:	
  Simplified	
  Data	
  
Processing	
  on	
  Large	
  Clusters."	
  OSDI	
  (2004)	
  
#t3chfest2015
Map-Reduce
12	
  
Input	
  data	
  
Map	
  
Map	
  
Map	
  
Map	
  
Reduce	
  
Reduce	
  
Reduce	
  
result	
  
#t3chfest2015
Map-Reduce
13	
  #t3chfest2015
val	
  wordCounts	
  =	
  textFile.flatMap(line	
  =>	
  line.split("	
  "))	
  
	
   	
   	
  .map(word	
  =>	
  (word,	
  1))	
  
	
   	
   	
  .reduceByKey(_	
  +	
  _)	
  
Apache	
  Spark	
  is	
  an	
  open-­‐source	
  cluster	
  compu3ng	
  
framework	
  originally	
  developed	
  in	
  the	
  AMPLab	
  at	
  UC	
  
Berkeley.	
  In	
  contrast	
  to	
  Hadoop's	
  two-­‐stage	
  disk-­‐
based	
  MapReduce	
  paradigm,	
  Spark's	
  in-­‐memory	
  
primi3ves	
  provide	
  performance	
  up	
  to	
  100	
  3mes	
  
faster	
  for	
  certain	
  applica3ons.	
  By	
  allowing	
  user	
  
programs	
  to	
  load	
  data	
  into	
  a	
  cluster's	
  memory	
  and	
  
query	
  it	
  repeatedly,	
  Spark	
  is	
  well	
  suited	
  to	
  machine	
  
learning	
  algorithms	
  
Array[String]	
  
Apache	
  
Spark	
  
is	
  
an	
  
open-­‐source	
  
cluster	
  
…	
  
Array[(String,	
  Int)]	
  
(Apache,	
  1)	
  
(Spark,	
  1)	
  
(is,	
  1)	
  
…	
  
(Spark,	
  1)	
  
(is,	
  1)	
  
…	
  
Array[(String,	
  Int)]	
  
(Apache,	
  1)	
  
(Spark,	
  2)	
  
(is,	
  2)	
  
…	
  
(to,	
  4)	
  
(the,	
  1)	
  
…	
  
Source:	
  Wikipedia	
  
o  Mayor flexibilidad en la definición de
transformaciones
o  Menor uso de almacenamiento en disco
o  Aprovechamiento de la memoria
o  Tolerancia a fallos
o  Tracción de la comunidad
Ventajas de Spark
14	
  #t3chfest2015
Conceptos básicos
15	
  
o  Abstracción básica en Spark
o  Contiene las transformaciones que se van a
realizar sobre un conjunto de datos
•  Inmutable
•  Lazy evaluation
•  En caso de fallo se puede recuperar el estado
•  Control de persistencia y particionado
RDD
16	
  #t3chfest2015
Ecosistema
17	
  
Ecosistema Spark
18	
  
©	
  databricks	
  
#t3chfest2015
o  Proporciona las abstracciones básicas y se
encarga del scheduling
Spark core engine
19	
  
RDD	
   DAG	
  Scheduling	
  
Cluster	
  
manager	
  
Threads	
  
Block	
  
manager	
  
Task	
  
scheduling	
  
Worker	
  
#t3chfest2015
o  Permite transformar una fuente streaming en
un conjunto de mini-batch
•  Definición de una ventana
§  Temporal
Spark Streaming
20	
  #t3chfest2015
Spark Streaming
21	
  
Window	
  =	
  5	
  
batch0	
   batch1	
   batch2	
   batch3	
   batch4	
   batch5	
   batch6	
   batch7	
  
3empo	
  
3empo	
  
#t3chfest2015
o  Librería para Machine Learning
o  Abstracciones útiles para cómputo
o  Vectores, Matrices dispersas
o  Implementación de algoritmos conocidos
o  Clasificación, regresión, collaborative
filtering y clustering
MLlib
22	
  #t3chfest2015
o  Capa de acceso SQL para ejecutar
operaciones sobre RDD
o  DataFrame (antes SchemaRDD)
SparkSQL
23	
  
val	
  people	
  =	
  sqlContext.parquetFile("...")	
  
val	
  department	
  =	
  sqlContext.parquetFile("...")	
  
people.filter("age"	
  >	
  30)	
  
	
  	
  	
  .join(department,	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  people("deptId")	
  ===	
  department("id"))	
  	
  
	
  	
  	
  .groupBy(department("name"),	
  "gender”)	
  
	
  	
  	
  	
   ©	
  databricks	
  
#t3chfest2015
Primeros pasos
24	
  
$	
  wget	
  http://www.apache.org/.../spark-­‐1.2.0-­‐bin-­‐hadoop2.4.tgz	
  
$	
  tar	
  xvzf	
  spark-­‐1.2.0-­‐bin-­‐hadoop2.4.tgz	
  
$	
  cd	
  spark-­‐1.2.0-­‐bin-­‐hadoop2.4	
  
$	
  cp	
  conf/spark-­‐env.sh.template	
  conf/spark-­‐env.sh	
  
$	
  ./bin/spark-­‐shell	
  
$	
  ./bin/spark-­‐shell	
  
…	
  
15/02/09	
  15:47:50	
  INFO	
  HttpServer:	
  Starting	
  HTTP	
  Server	
  
15/02/09	
  15:47:50	
  INFO	
  Utils:	
  Successfully	
  started	
  service	
  'HTTP	
  class	
  server'	
  on	
  port	
  60416.	
  
Welcome	
  to	
  
	
  	
  	
  	
  	
  	
  ____	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  __	
  
	
  	
  	
  	
  	
  /	
  __/__	
  	
  ___	
  _____/	
  /__	
  
	
  	
  	
  	
  _	
  /	
  _	
  /	
  _	
  `/	
  __/	
  	
  '_/	
  
	
  	
  	
  /___/	
  .__/_,_/_/	
  /_/_	
  	
  	
  version	
  1.2.0	
  
	
  	
  	
  	
  	
  	
  /_/	
  
	
  
Using	
  Scala	
  version	
  2.10.4	
  (Java	
  HotSpot(TM)	
  64-­‐Bit	
  Server	
  VM,	
  Java	
  1.7.0_71)	
  
Type	
  in	
  expressions	
  to	
  have	
  them	
  evaluated.	
  
scala>	
  	
  
hep://localhost:4040	
  
#t3chfest2015
25	
  
WE ARE HIRING!
Java
Scala
Ping
pong
Nerf
Big
Data
Spark
Hadoop
Cassandra
MongoDB
NoSQL
Passion
BIG DATA
CHILD`S PLAY
@dhiguero
dhiguero@stratio.com
Daniel Higuero
Acknowledgements: This work has been partially funded by
the Spanish Ministry of Economy and Competitiveness under
grant PTQ-13-05997

More Related Content

What's hot

What's hot (20)

Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and SparkCassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark tutorial
Spark tutorialSpark tutorial
Spark tutorial
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
 
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 

Viewers also liked

Guia practica de_gestion_de_riesgos
Guia practica de_gestion_de_riesgosGuia practica de_gestion_de_riesgos
Guia practica de_gestion_de_riesgos
MM CO
 

Viewers also liked (20)

¿Por que cambiar de Apache Hadoop a Apache Spark?
¿Por que cambiar de Apache Hadoop a Apache Spark?¿Por que cambiar de Apache Hadoop a Apache Spark?
¿Por que cambiar de Apache Hadoop a Apache Spark?
 
Tutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtimeTutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtime
 
Introduccion a Apache Spark
Introduccion a Apache SparkIntroduccion a Apache Spark
Introduccion a Apache Spark
 
Spark Hands-on
Spark Hands-onSpark Hands-on
Spark Hands-on
 
Introducción a Apache Spark a través de un caso de uso cotidiano
Introducción a Apache Spark a través de un caso de uso cotidianoIntroducción a Apache Spark a través de un caso de uso cotidiano
Introducción a Apache Spark a través de un caso de uso cotidiano
 
Revelando los secretos de twitter, Festival de Software Libre 2014
Revelando los secretos de twitter, Festival de Software Libre 2014Revelando los secretos de twitter, Festival de Software Libre 2014
Revelando los secretos de twitter, Festival de Software Libre 2014
 
Hypertable Nosql
Hypertable NosqlHypertable Nosql
Hypertable Nosql
 
7 Disparadores de Engagement para o mercado de consumo massivo
7 Disparadores de Engagement para o mercado de consumo massivo7 Disparadores de Engagement para o mercado de consumo massivo
7 Disparadores de Engagement para o mercado de consumo massivo
 
Bases de Datos NoSQL - Riak
Bases de Datos NoSQL - Riak Bases de Datos NoSQL - Riak
Bases de Datos NoSQL - Riak
 
Cloud or not to Cloud? That’s the question Businesses need an answer for!
Cloud or not to Cloud? That’s the question Businesses need an answer for!Cloud or not to Cloud? That’s the question Businesses need an answer for!
Cloud or not to Cloud? That’s the question Businesses need an answer for!
 
Primeros pasos con Apache Spark - Madrid Meetup
Primeros pasos con Apache Spark - Madrid MeetupPrimeros pasos con Apache Spark - Madrid Meetup
Primeros pasos con Apache Spark - Madrid Meetup
 
Guia practica de_gestion_de_riesgos
Guia practica de_gestion_de_riesgosGuia practica de_gestion_de_riesgos
Guia practica de_gestion_de_riesgos
 
24 HOP edición Español - Machine learning - Cesar Oviedo
24 HOP edición Español - Machine learning - Cesar Oviedo24 HOP edición Español - Machine learning - Cesar Oviedo
24 HOP edición Español - Machine learning - Cesar Oviedo
 
Big data big opportunities
Big data big opportunitiesBig data big opportunities
Big data big opportunities
 
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
 
Curso Cloud Computing, Parte 1: Amazon Web Services
Curso Cloud Computing, Parte 1: Amazon Web ServicesCurso Cloud Computing, Parte 1: Amazon Web Services
Curso Cloud Computing, Parte 1: Amazon Web Services
 
Cloud Computing: una perspectiva tecnológica
Cloud Computing: una perspectiva tecnológicaCloud Computing: una perspectiva tecnológica
Cloud Computing: una perspectiva tecnológica
 
09 gestion de los riesgos
09 gestion de los riesgos09 gestion de los riesgos
09 gestion de los riesgos
 
MongoDB: la BBDD NoSQL más popular del mercado
MongoDB: la BBDD NoSQL más popular del mercadoMongoDB: la BBDD NoSQL más popular del mercado
MongoDB: la BBDD NoSQL más popular del mercado
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 

Similar to Adios hadoop, Hola Spark! T3chfest 2015

Similar to Adios hadoop, Hola Spark! T3chfest 2015 (20)

Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - Spark
 
Big data clustering
Big data clusteringBig data clustering
Big data clustering
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016
 

Recently uploaded

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 

Adios hadoop, Hola Spark! T3chfest 2015

  • 1. Adiós Hadoop Hola Spark! 1   @dhiguero dhiguero@stratio.com Daniel Higuero
  • 2. •  Introducción •  Spark §  Conceptos básicos §  Ecosistema Agenda 2  
  • 3. 3   VIEWER DISCRETION IS ADVISED All  elephants  are  innocent  un3l  proven  guilty  in  a   court  of  development   Opinions  expressed  are  solely  my  own  and  do  not  express  the  views  or  opinions  of  my  employer.  
  • 5. Timeline #t3chfest2015 5   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   Google   MapReduce   paper   Google   GFS  paper  
  • 6. Timeline #t3chfest2015 6   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   Google   MapReduce   paper   Google   GFS  paper   Hive   HBase   Hadoop  1TB,   910  nodes  <  4   min  
  • 7. Timeline #t3chfest2015 7   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   Google   MapReduce   paper   Google   GFS  paper   Hive   HBase   Hadoop  1TB,   910  nodes  <  4   min   alpha-­‐0.1   Spark  0.7  
  • 8. Timeline #t3chfest2015 8   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   Google   MapReduce   paper   Google   GFS  paper   Hive   HBase   Hadoop  1TB,   910  nodes  <  4   min   Hadoop  103  TB,   2100  nodes,  72   min   alpha-­‐0.1   Spark  0.7  
  • 9. Timeline #t3chfest2015 9   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   Google   MapReduce   paper   Google   GFS  paper   Hive   HBase   Hadoop  1TB,   910  nodes  <  4   min   Spark  100  TB,   206  nodes,  23   min   Hadoop  103  TB,   2100  nodes,  72   min   alpha-­‐0.1   Spark  0.7   Spark  1.2+  
  • 10. o  ¿Qué es Spark? o  Framework de procesamiento paralelo o  Historia Introducción 10   https://spark.apache.org/ Apache  SoOware  Founda3on   #t3chfest2015
  • 11. o  Concepto de programación funcional o  Popularizado por Google Map-reduce 11   (map  'list  (lambda  (x)  (+  x  10))  '(1  2  3  4))                  =>  (11  12  13  14)   (reduce  #'+  '(1  2  3  4))  =>  10   Jeff  Dean  and  Sanjay  Ghemawat.  "MapReduce:  Simplified  Data   Processing  on  Large  Clusters."  OSDI  (2004)   #t3chfest2015
  • 12. Map-Reduce 12   Input  data   Map   Map   Map   Map   Reduce   Reduce   Reduce   result   #t3chfest2015
  • 13. Map-Reduce 13  #t3chfest2015 val  wordCounts  =  textFile.flatMap(line  =>  line.split("  "))        .map(word  =>  (word,  1))        .reduceByKey(_  +  _)   Apache  Spark  is  an  open-­‐source  cluster  compu3ng   framework  originally  developed  in  the  AMPLab  at  UC   Berkeley.  In  contrast  to  Hadoop's  two-­‐stage  disk-­‐ based  MapReduce  paradigm,  Spark's  in-­‐memory   primi3ves  provide  performance  up  to  100  3mes   faster  for  certain  applica3ons.  By  allowing  user   programs  to  load  data  into  a  cluster's  memory  and   query  it  repeatedly,  Spark  is  well  suited  to  machine   learning  algorithms   Array[String]   Apache   Spark   is   an   open-­‐source   cluster   …   Array[(String,  Int)]   (Apache,  1)   (Spark,  1)   (is,  1)   …   (Spark,  1)   (is,  1)   …   Array[(String,  Int)]   (Apache,  1)   (Spark,  2)   (is,  2)   …   (to,  4)   (the,  1)   …   Source:  Wikipedia  
  • 14. o  Mayor flexibilidad en la definición de transformaciones o  Menor uso de almacenamiento en disco o  Aprovechamiento de la memoria o  Tolerancia a fallos o  Tracción de la comunidad Ventajas de Spark 14  #t3chfest2015
  • 16. o  Abstracción básica en Spark o  Contiene las transformaciones que se van a realizar sobre un conjunto de datos •  Inmutable •  Lazy evaluation •  En caso de fallo se puede recuperar el estado •  Control de persistencia y particionado RDD 16  #t3chfest2015
  • 18. Ecosistema Spark 18   ©  databricks   #t3chfest2015
  • 19. o  Proporciona las abstracciones básicas y se encarga del scheduling Spark core engine 19   RDD   DAG  Scheduling   Cluster   manager   Threads   Block   manager   Task   scheduling   Worker   #t3chfest2015
  • 20. o  Permite transformar una fuente streaming en un conjunto de mini-batch •  Definición de una ventana §  Temporal Spark Streaming 20  #t3chfest2015
  • 21. Spark Streaming 21   Window  =  5   batch0   batch1   batch2   batch3   batch4   batch5   batch6   batch7   3empo   3empo   #t3chfest2015
  • 22. o  Librería para Machine Learning o  Abstracciones útiles para cómputo o  Vectores, Matrices dispersas o  Implementación de algoritmos conocidos o  Clasificación, regresión, collaborative filtering y clustering MLlib 22  #t3chfest2015
  • 23. o  Capa de acceso SQL para ejecutar operaciones sobre RDD o  DataFrame (antes SchemaRDD) SparkSQL 23   val  people  =  sqlContext.parquetFile("...")   val  department  =  sqlContext.parquetFile("...")   people.filter("age"  >  30)        .join(department,                    people("deptId")  ===  department("id"))          .groupBy(department("name"),  "gender”)           ©  databricks   #t3chfest2015
  • 24. Primeros pasos 24   $  wget  http://www.apache.org/.../spark-­‐1.2.0-­‐bin-­‐hadoop2.4.tgz   $  tar  xvzf  spark-­‐1.2.0-­‐bin-­‐hadoop2.4.tgz   $  cd  spark-­‐1.2.0-­‐bin-­‐hadoop2.4   $  cp  conf/spark-­‐env.sh.template  conf/spark-­‐env.sh   $  ./bin/spark-­‐shell   $  ./bin/spark-­‐shell   …   15/02/09  15:47:50  INFO  HttpServer:  Starting  HTTP  Server   15/02/09  15:47:50  INFO  Utils:  Successfully  started  service  'HTTP  class  server'  on  port  60416.   Welcome  to              ____                            __            /  __/__    ___  _____/  /__          _  /  _  /  _  `/  __/    '_/        /___/  .__/_,_/_/  /_/_      version  1.2.0              /_/     Using  Scala  version  2.10.4  (Java  HotSpot(TM)  64-­‐Bit  Server  VM,  Java  1.7.0_71)   Type  in  expressions  to  have  them  evaluated.   scala>     hep://localhost:4040   #t3chfest2015
  • 25. 25   WE ARE HIRING! Java Scala Ping pong Nerf Big Data Spark Hadoop Cassandra MongoDB NoSQL Passion
  • 26. BIG DATA CHILD`S PLAY @dhiguero dhiguero@stratio.com Daniel Higuero Acknowledgements: This work has been partially funded by the Spanish Ministry of Economy and Competitiveness under grant PTQ-13-05997