SlideShare a Scribd company logo
1 of 31
1 / 31
BIG DATA TECHNOLOGY
● Juanjo Mostazo
● c-base Berlin
● May 2014
2 / 31
Roadmap
● Map Reduce
● Hadoop
● Concepts
● HDFS
● Architecture
● Hadoop Ecosystem
● Lambda Architecture
● New trends
3 / 31
M/R: Motivation
● Process
big amount
of data to
produce
other data
● Scale up vs
Scale out
4 / 31
M/R: What is it?
● Different programming paradigm
● Based on a google paper (2004)
● Automatic parallelization and distribution
● I/O Scheduling
● Fault tolerance
● Status and monitoring
5 / 31
M/R: The paradigm
● Input & Output: set of key/value pairs
● Big amount of data group & sort
● Job = Two phases = Mapper & Reducer
● Map (in_key, in_value) →
list(interm_key, interm_value)
● Reduce (interm_key,
list(interm_value)) →
list (out_key, out_value)
6 / 31
M/R: Example
(word counter)
7 / 31
M/R: Workflow
8 / 31
Roadmap
● Map Reduce
● Hadoop
● Concepts
● HDFS
● Architecture
● Hadoop Ecosystem
● Lambda Architecture
● New trends
9 / 31
Hadoop: What is it?
● Framework based on GMR / GFS
● Apache project
● Developed in Java
● Multiple applications
● Used by many companies
● De-facto standard in community
10 / 31
Hadoop: HDFS Architecture
11 / 31
Hadoop: HDFS concepts
● Distributed file system. Layer
on top ext3, xfs...
● Works better on huge files
● Redundancy (default 3)
● Bad seeking, no append!
● Good rack scale. Not good
data center scale
● File divided in 128Mb –
256Mb blocks
● Computation is sent to data!
12 / 31
Hadoop: Architecture v1
13 / 31
Hadoop: Architecture v2
14 / 31
Hadoop: Architecture v3
15 / 31
M/R: Example
(word counter)
16 / 31
Hadoop: Clustering
17 / 31
Hadoop: Advanced
● Distributed caches
● Partitioner
● Sort comparator
● Group comparator
● Combiner
● Input format & Record reader
● MultiInput
● MultiOutput
● Compression
18 / 31
Hadoop: Conclusions
● Simplify large-scale computation
● Hide parallel programming issues
● Easy to get into & develop (huge doc)
● Deeply used & maintained by community
● Possibility to throw away RDBMs! (Bottleneck)
19 / 31
Roadmap
● Map Reduce
● Hadoop
● Concepts
● HDFS
● Architecture
● Hadoop Ecosystem
● Lambda Architecture
● New trends
20 / 31
Hadoop: Ecosystem
21 / 31
Roadmap
● Map Reduce
● Hadoop
● Concepts
● HDFS
● Architecture
● Hadoop Ecosystem
● Lambda Architecture
● New trends
22 / 31
Lambda Architecture: Motivation
● Real time use cases
● Business analytics
● Batch processing vs Real Time
● Problem!
● Low latency read & update
● Scalable & fault tolerant
● Something else needed!
23 / 31
Lambda Architecture: Schema
24 / 31
Lambda Architecture: Example 1
25 / 31
Lambda Architecture: Example 2
26 / 31
Lambda Architecture: Lambdoop
● Unified technology stack
● High level programming environment
● Management tools
27 / 31
Roadmap
● Map Reduce
● Hadoop
● Concepts
● HDFS
● Architecture
● Hadoop Ecosystem
● Lambda Architecture
● New trends
28 / 31
New trends: Architecture
● Hadoop vs Hadoop2
● Columnar storage
29 / 31
New trends: Storm
● Stream processing
● Tuples
● Streams
● Spouts
● Bolts
● Topologies
● Twitter
30 / 31
New trends: Spark
● Next generation MapReduce
● Integrated but not dependent on Hadoop
● Fast memory optimized execution engine
● Avoids many Hadoop problems
● Overhead
● High latency
● Many disk writes
● In-memory cache
● Flexible executions graph
● Much faster than MapReduce (up to 100x)
● Shark (SQL)
● Support streaming (beta)
31 / 31
BIG DATA TECHNOLOGY
● Juanjo Mostazo
● juanj.mostazo@gmail.com
● http://www.slideshare.net/juanjmostazo/mr-hadoop-cbase

More Related Content

Viewers also liked

Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data ArchitecturesGuido Schmutz
 
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...DataStax Academy
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 

Viewers also liked (6)

Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
 
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 

Similar to Big Data Technology

Mr hadoop seedrocket
Mr hadoop seedrocketMr hadoop seedrocket
Mr hadoop seedrocketSeedRocket
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems researchVasia Kalavri
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduceFrane Bandov
 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's studentsMohamed Nadjib MAMI
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce Sina Ebrahimi
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on HadoopVivian S. Zhang
 
MapReduce
MapReduceMapReduce
MapReducerobjk
 
MapReduce
MapReduceMapReduce
MapReducerobjk
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?TerrierTeam
 
BlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search FeedbackBlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search Feedbacksinfomicien
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationHao Xu
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 

Similar to Big Data Technology (20)

Mr hadoop seedrocket
Mr hadoop seedrocketMr hadoop seedrocket
Mr hadoop seedrocket
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's students
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
MapReduce
MapReduceMapReduce
MapReduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?
 
BlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search FeedbackBlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search Feedback
 
Main map reduce
Main map reduceMain map reduce
Main map reduce
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale Automation
 
Map and Reduce
Map and ReduceMap and Reduce
Map and Reduce
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
2014 hadoop wrocław jug
2014 hadoop   wrocław jug2014 hadoop   wrocław jug
2014 hadoop wrocław jug
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 

Recently uploaded

Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectGayathriM270621
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labsamber724300
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 

Recently uploaded (20)

Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subject
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labs
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 

Big Data Technology

  • 1. 1 / 31 BIG DATA TECHNOLOGY ● Juanjo Mostazo ● c-base Berlin ● May 2014
  • 2. 2 / 31 Roadmap ● Map Reduce ● Hadoop ● Concepts ● HDFS ● Architecture ● Hadoop Ecosystem ● Lambda Architecture ● New trends
  • 3. 3 / 31 M/R: Motivation ● Process big amount of data to produce other data ● Scale up vs Scale out
  • 4. 4 / 31 M/R: What is it? ● Different programming paradigm ● Based on a google paper (2004) ● Automatic parallelization and distribution ● I/O Scheduling ● Fault tolerance ● Status and monitoring
  • 5. 5 / 31 M/R: The paradigm ● Input & Output: set of key/value pairs ● Big amount of data group & sort ● Job = Two phases = Mapper & Reducer ● Map (in_key, in_value) → list(interm_key, interm_value) ● Reduce (interm_key, list(interm_value)) → list (out_key, out_value)
  • 6. 6 / 31 M/R: Example (word counter)
  • 7. 7 / 31 M/R: Workflow
  • 8. 8 / 31 Roadmap ● Map Reduce ● Hadoop ● Concepts ● HDFS ● Architecture ● Hadoop Ecosystem ● Lambda Architecture ● New trends
  • 9. 9 / 31 Hadoop: What is it? ● Framework based on GMR / GFS ● Apache project ● Developed in Java ● Multiple applications ● Used by many companies ● De-facto standard in community
  • 10. 10 / 31 Hadoop: HDFS Architecture
  • 11. 11 / 31 Hadoop: HDFS concepts ● Distributed file system. Layer on top ext3, xfs... ● Works better on huge files ● Redundancy (default 3) ● Bad seeking, no append! ● Good rack scale. Not good data center scale ● File divided in 128Mb – 256Mb blocks ● Computation is sent to data!
  • 12. 12 / 31 Hadoop: Architecture v1
  • 13. 13 / 31 Hadoop: Architecture v2
  • 14. 14 / 31 Hadoop: Architecture v3
  • 15. 15 / 31 M/R: Example (word counter)
  • 16. 16 / 31 Hadoop: Clustering
  • 17. 17 / 31 Hadoop: Advanced ● Distributed caches ● Partitioner ● Sort comparator ● Group comparator ● Combiner ● Input format & Record reader ● MultiInput ● MultiOutput ● Compression
  • 18. 18 / 31 Hadoop: Conclusions ● Simplify large-scale computation ● Hide parallel programming issues ● Easy to get into & develop (huge doc) ● Deeply used & maintained by community ● Possibility to throw away RDBMs! (Bottleneck)
  • 19. 19 / 31 Roadmap ● Map Reduce ● Hadoop ● Concepts ● HDFS ● Architecture ● Hadoop Ecosystem ● Lambda Architecture ● New trends
  • 20. 20 / 31 Hadoop: Ecosystem
  • 21. 21 / 31 Roadmap ● Map Reduce ● Hadoop ● Concepts ● HDFS ● Architecture ● Hadoop Ecosystem ● Lambda Architecture ● New trends
  • 22. 22 / 31 Lambda Architecture: Motivation ● Real time use cases ● Business analytics ● Batch processing vs Real Time ● Problem! ● Low latency read & update ● Scalable & fault tolerant ● Something else needed!
  • 23. 23 / 31 Lambda Architecture: Schema
  • 24. 24 / 31 Lambda Architecture: Example 1
  • 25. 25 / 31 Lambda Architecture: Example 2
  • 26. 26 / 31 Lambda Architecture: Lambdoop ● Unified technology stack ● High level programming environment ● Management tools
  • 27. 27 / 31 Roadmap ● Map Reduce ● Hadoop ● Concepts ● HDFS ● Architecture ● Hadoop Ecosystem ● Lambda Architecture ● New trends
  • 28. 28 / 31 New trends: Architecture ● Hadoop vs Hadoop2 ● Columnar storage
  • 29. 29 / 31 New trends: Storm ● Stream processing ● Tuples ● Streams ● Spouts ● Bolts ● Topologies ● Twitter
  • 30. 30 / 31 New trends: Spark ● Next generation MapReduce ● Integrated but not dependent on Hadoop ● Fast memory optimized execution engine ● Avoids many Hadoop problems ● Overhead ● High latency ● Many disk writes ● In-memory cache ● Flexible executions graph ● Much faster than MapReduce (up to 100x) ● Shark (SQL) ● Support streaming (beta)
  • 31. 31 / 31 BIG DATA TECHNOLOGY ● Juanjo Mostazo ● juanj.mostazo@gmail.com ● http://www.slideshare.net/juanjmostazo/mr-hadoop-cbase