SlideShare a Scribd company logo
1 of 39
MAPREDUCE
Hadoop Mapreduce paradigm
• Hadoop is an open-source software framework
for storing and processing large datasets ranging
in size from gigabytes to petabytes.
• developed at the Apache Software Foundation.
• basically two components in Hadoop:
1. Massive data storage
2. Faster data processing
2
Hadoop Mapreduce paradigm
• Hadoop distributed File System (HDFS):
• It allows you to store data of various formats
across a cluster.
• Map-Reduce:
• For resource management in Hadoop. It allows
parallel processing over the data stored across
HDFS.
3
History of Hadoop
4
Why Hadoop?
• Cost Effective System
• Computing power
• Scalability
• Storage flexibility
• Inherent data protection
• Varied Data Sources
• Fault-Tolerant
• Highly Available
• Low Network Traffic
• High Throughput
• Multiple Languages Supported
5
Disadvantages of Hadoop
• Issue With Small Files
• Vulnerable By Nature
• Processing Overhead
• Supports Only Batch Processing
• Iterative Processing
• Security
6
Traditional restaurant scenerio
7
Traditional Scenario
8
Distributed Processing Scenario
9
Distributed Processing Scenario Failure
10
Solution of Restaurant problem
11
Hadoop in Restaurant Analogy
12
Map tasks
• Process independent chunks in a parallel manner
• Out of map task stored as intermediate data on
local disk of that server
13
• Out of mapper automatically shuffled and stored
by framework
• Sorts the output based on key
• Provide reduced output by combining the output
f various mappers
Reduce task
14
Map-reduce daemons
1. JobTrackers
2. TaskTrackers
15
JobTracker
• Master daemon
• Single JobTracker per Hadoop cluster
• Provide connectivity between Hadoop and client
application
• Execution plan creation(which task to assign to
which node)
• Monitor all running tasks
• If task failed then rescheduling
16
Task Tracker
• Responsible for executing individual task which
is assigned by JobTracker
• Single Task Tracker per slave
• Continuously sends heartbeat message to Job
Tracker
• If no heartbeat message then task will be
allocated to other Task Trackers
17
Map-reduce execution pipeline
18
Mapper
• Mapper maps the input key-value pairs into a set of
intermediate key-value pairs
• Phases:
1. RecordReader:
• Converts tasks with key value pairs
• <Key , value>  <positional information, chunk of
data that constitutes the record>
2. Map:
• generate zero or more intermediate key-value pairs
19
3. Combiner
• Optimization technique for mapreduce job,
applies user specific aggregate function to only
that mapper
• Also known as Local reducer
4. Partitioner
• Intermediate key-value pairs
• Usually Number of partitions are equal to the
number of reducer
20
Mapper
Reducer
1. Shuffle and sort:
• consumes the output of Mapping phase
• consolidate the relevant records from Mapping
phase output.
• the same words are clubbed together along with
their respective frequency.
21
Reducer
2. Reducer:
• Grouped data produced by the shuffle and sort phase
• Apply reduce function
• Process one group at a time
• Reducer function iterate all the values associated with that key
• Aggregation, filtering,combining
22
3. Output format:
• Separates key value pair with tab
• Write it out to a file using record writer
23
API
• Main Class file Packages
• Mapper Class Packages
• Reducer Class Packages
24
Main class file packages
25
• import org.apache.hadoop.conf.Configured; (Configuration of system parameters)
• import org.apache.hadoop.fs.Path; (Configuration of file system path)
• import org.apache.hadoop.io.IntWritable; (Input/output package to display in output screen)
• import org.apache.hadoop.io.Text; ( to read and write the text)
• import org.apache.hadoop.mapred.FileInputFormat; ( MapRed file input format)
• import org.apache.hadoop.mapred.FileOutputFormat; ; ( MapRed file output format)
• import org.apache.hadoop.mapred.JobClient; ( assign the input job and process)
• import org.apache.hadoop.mapred.JobConf; (configuration file to execute I/O process)
• import org.apache.hadoop.util.Tool; (interface
(command line options) used to access MapRed
functions)
• import org.apache.hadoop.util.ToolRunner;
( Interface use to call run function)
26
Mapper File Packages
• import java.io.IOException; ( Exception handle)
• import org.apache.hadoop.io.IntWritable; ( to read the integer file)
• import org.apache.hadoop.io.LongWritable; (to read files range exceeding integer)
• import org.apache.hadoop.io.Text; (Input and output text)
• import org.apache.hadoop.mapred.MapReduceBase;( Inherited class of MapReduce functions)
• import org.apache.hadoop.mapred.Mapper; (Mapper Class)
• import org.apache.hadoop.mapred.OutputCollector; ( to collect and display class)
• import org.apache.hadoop.mapred.Reporter; (to display the information)
27
Reducer file Package
• import java.io.IOException; ( Exception handle)
• import java.util.Iterator; (to call utility function has more elements from iterator class)
• import org.apache.hadoop.io.IntWritable; ( to read the integer file)
• import org.apache.hadoop.io.Text; (Input and output text)
28
Reducer file Package
• import org.apache.hadoop.mapred.MapReduceBase; ( Inherited class of
MapReduce functions)
• import org.apache.hadoop.mapred.OutputCollector; ( to collect and
display class)
• import org.apache.hadoop.mapred.Reducer; (Reducer Class)
• import org.apache.hadoop.mapred.Reporter; (to display the
information)
29
Hadoop 2.0 features
• HDFS Federation – horizontal scalability of
NameNode
• NameNode High Availability – NameNode is no
longer a Single Point of Failure
• YARN – ability to process Terabytes and
Petabytes of data available in HDFS using Non-
MapReduce applications such as MPI, GIRAPH
30
Hadoop 2.0 features
• Resource Manager – splits up the two major
functionalities of overburdened JobTracker
(resource management and job
scheduling/monitoring) into two separate
daemons: a global Resource Manager and per-
application ApplicationMaster
• Capacity Scheduler
• Data Snapshot
• Support for Windows
31
Namenode high availability
• Hadoop 1.x, NameNode was single point of failure
• Hadoop Administrators need to manually recover
the NameNode using Secondary NameNode.
• Hadoop 2.0 Architecture supports multiple
NameNodes to remove this bottleneck
• Passive Standby NameNode support.
• In case of Active NameNode failure, the passive
NameNode becomes the Active NameNode and
starts writing to the shared storage
32
YARN(Yet Another Resource Negotiator)
• Main idea is splitting the JobTracker
responsibility of resource management and Job
scheduling into separate daemons.
33
YARN daemons
1. Global resource manager:
a) Scheduler(allocation of resources among
various running applications)
b) Application manager(Accepting job
submission, restarting application master in
case of failure)
34
YARN daemons
2. Node manager:
• Pre machine slave daemon
• Launching application container for application
execution
• Report usage of resources to the global resource
manager
35
YARN daemons
3. Application master:
• Application specific entity
• Negotiate required resources for execution from
the resource manager
• Works with node manager for executing and
monitoring component tasks
36
YARN
37
YARN workflow
1. Client submits an application
2. The Resource Manager allocates a container to start the
Application Manager
3. The Application Manager registers itself with the Resource
Manager
4. The Application Manager negotiates containers from the Resource
Manager
5. The Application Manager notifies the Node Manager to launch
containers
6. Application code is executed in the container
7. Client contacts Resource Manager/Application Manager to
monitor application’s status
8. Once the processing is complete, the Application Manager un-
registers with the Resource Manager
38
39

More Related Content

Similar to Hadoop Map-Reduce from the subject: Big Data Analytics

Similar to Hadoop Map-Reduce from the subject: Big Data Analytics (20)

Hadoop data analysis
Hadoop data analysisHadoop data analysis
Hadoop data analysis
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Hadoop
HadoopHadoop
Hadoop
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Hadoop – Architecture.pptx
Hadoop – Architecture.pptxHadoop – Architecture.pptx
Hadoop – Architecture.pptx
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Presentation
PresentationPresentation
Presentation
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 

Recently uploaded

Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
Dr Mrs A A Miraje C Programming PPT.pptx
Dr Mrs A A Miraje C Programming PPT.pptxDr Mrs A A Miraje C Programming PPT.pptx
Dr Mrs A A Miraje C Programming PPT.pptxProfAAMiraje
 
Presentation on Slab, Beam, Column, and Foundation/Footing
Presentation on Slab,  Beam, Column, and Foundation/FootingPresentation on Slab,  Beam, Column, and Foundation/Footing
Presentation on Slab, Beam, Column, and Foundation/FootingEr. Suman Jyoti
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfKira Dess
 
Overview of Transformation in Computer Graphics
Overview of Transformation in Computer GraphicsOverview of Transformation in Computer Graphics
Overview of Transformation in Computer GraphicsChandrakantDivate1
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfSkNahidulIslamShrabo
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfEr.Sonali Nasikkar
 
Degrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptxDegrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptxMostafa Mahmoud
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelDrAjayKumarYadav4
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)ChandrakantDivate1
 
Study of Computer Hardware System using Block Diagram
Study of Computer Hardware System using Block DiagramStudy of Computer Hardware System using Block Diagram
Study of Computer Hardware System using Block DiagramChandrakantDivate1
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashidFaiyazSheikh
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024EMMANUELLEFRANCEHELI
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementDr. Deepak Mudgal
 
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...AshwaniAnuragi1
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdfVinayVadlagattu
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
Fundamentals of Structure in C Programming
Fundamentals of Structure in C ProgrammingFundamentals of Structure in C Programming
Fundamentals of Structure in C ProgrammingChandrakantDivate1
 

Recently uploaded (20)

Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Dr Mrs A A Miraje C Programming PPT.pptx
Dr Mrs A A Miraje C Programming PPT.pptxDr Mrs A A Miraje C Programming PPT.pptx
Dr Mrs A A Miraje C Programming PPT.pptx
 
Presentation on Slab, Beam, Column, and Foundation/Footing
Presentation on Slab,  Beam, Column, and Foundation/FootingPresentation on Slab,  Beam, Column, and Foundation/Footing
Presentation on Slab, Beam, Column, and Foundation/Footing
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 
Overview of Transformation in Computer Graphics
Overview of Transformation in Computer GraphicsOverview of Transformation in Computer Graphics
Overview of Transformation in Computer Graphics
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Degrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptxDegrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptx
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)
 
Study of Computer Hardware System using Block Diagram
Study of Computer Hardware System using Block DiagramStudy of Computer Hardware System using Block Diagram
Study of Computer Hardware System using Block Diagram
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth Reinforcement
 
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdf
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Fundamentals of Structure in C Programming
Fundamentals of Structure in C ProgrammingFundamentals of Structure in C Programming
Fundamentals of Structure in C Programming
 

Hadoop Map-Reduce from the subject: Big Data Analytics

  • 2. Hadoop Mapreduce paradigm • Hadoop is an open-source software framework for storing and processing large datasets ranging in size from gigabytes to petabytes. • developed at the Apache Software Foundation. • basically two components in Hadoop: 1. Massive data storage 2. Faster data processing 2
  • 3. Hadoop Mapreduce paradigm • Hadoop distributed File System (HDFS): • It allows you to store data of various formats across a cluster. • Map-Reduce: • For resource management in Hadoop. It allows parallel processing over the data stored across HDFS. 3
  • 5. Why Hadoop? • Cost Effective System • Computing power • Scalability • Storage flexibility • Inherent data protection • Varied Data Sources • Fault-Tolerant • Highly Available • Low Network Traffic • High Throughput • Multiple Languages Supported 5
  • 6. Disadvantages of Hadoop • Issue With Small Files • Vulnerable By Nature • Processing Overhead • Supports Only Batch Processing • Iterative Processing • Security 6
  • 12. Hadoop in Restaurant Analogy 12
  • 13. Map tasks • Process independent chunks in a parallel manner • Out of map task stored as intermediate data on local disk of that server 13 • Out of mapper automatically shuffled and stored by framework • Sorts the output based on key • Provide reduced output by combining the output f various mappers Reduce task
  • 14. 14
  • 16. JobTracker • Master daemon • Single JobTracker per Hadoop cluster • Provide connectivity between Hadoop and client application • Execution plan creation(which task to assign to which node) • Monitor all running tasks • If task failed then rescheduling 16
  • 17. Task Tracker • Responsible for executing individual task which is assigned by JobTracker • Single Task Tracker per slave • Continuously sends heartbeat message to Job Tracker • If no heartbeat message then task will be allocated to other Task Trackers 17
  • 19. Mapper • Mapper maps the input key-value pairs into a set of intermediate key-value pairs • Phases: 1. RecordReader: • Converts tasks with key value pairs • <Key , value>  <positional information, chunk of data that constitutes the record> 2. Map: • generate zero or more intermediate key-value pairs 19
  • 20. 3. Combiner • Optimization technique for mapreduce job, applies user specific aggregate function to only that mapper • Also known as Local reducer 4. Partitioner • Intermediate key-value pairs • Usually Number of partitions are equal to the number of reducer 20 Mapper
  • 21. Reducer 1. Shuffle and sort: • consumes the output of Mapping phase • consolidate the relevant records from Mapping phase output. • the same words are clubbed together along with their respective frequency. 21
  • 22. Reducer 2. Reducer: • Grouped data produced by the shuffle and sort phase • Apply reduce function • Process one group at a time • Reducer function iterate all the values associated with that key • Aggregation, filtering,combining 22 3. Output format: • Separates key value pair with tab • Write it out to a file using record writer
  • 23. 23
  • 24. API • Main Class file Packages • Mapper Class Packages • Reducer Class Packages 24
  • 25. Main class file packages 25 • import org.apache.hadoop.conf.Configured; (Configuration of system parameters) • import org.apache.hadoop.fs.Path; (Configuration of file system path) • import org.apache.hadoop.io.IntWritable; (Input/output package to display in output screen) • import org.apache.hadoop.io.Text; ( to read and write the text) • import org.apache.hadoop.mapred.FileInputFormat; ( MapRed file input format) • import org.apache.hadoop.mapred.FileOutputFormat; ; ( MapRed file output format) • import org.apache.hadoop.mapred.JobClient; ( assign the input job and process) • import org.apache.hadoop.mapred.JobConf; (configuration file to execute I/O process)
  • 26. • import org.apache.hadoop.util.Tool; (interface (command line options) used to access MapRed functions) • import org.apache.hadoop.util.ToolRunner; ( Interface use to call run function) 26
  • 27. Mapper File Packages • import java.io.IOException; ( Exception handle) • import org.apache.hadoop.io.IntWritable; ( to read the integer file) • import org.apache.hadoop.io.LongWritable; (to read files range exceeding integer) • import org.apache.hadoop.io.Text; (Input and output text) • import org.apache.hadoop.mapred.MapReduceBase;( Inherited class of MapReduce functions) • import org.apache.hadoop.mapred.Mapper; (Mapper Class) • import org.apache.hadoop.mapred.OutputCollector; ( to collect and display class) • import org.apache.hadoop.mapred.Reporter; (to display the information) 27
  • 28. Reducer file Package • import java.io.IOException; ( Exception handle) • import java.util.Iterator; (to call utility function has more elements from iterator class) • import org.apache.hadoop.io.IntWritable; ( to read the integer file) • import org.apache.hadoop.io.Text; (Input and output text) 28
  • 29. Reducer file Package • import org.apache.hadoop.mapred.MapReduceBase; ( Inherited class of MapReduce functions) • import org.apache.hadoop.mapred.OutputCollector; ( to collect and display class) • import org.apache.hadoop.mapred.Reducer; (Reducer Class) • import org.apache.hadoop.mapred.Reporter; (to display the information) 29
  • 30. Hadoop 2.0 features • HDFS Federation – horizontal scalability of NameNode • NameNode High Availability – NameNode is no longer a Single Point of Failure • YARN – ability to process Terabytes and Petabytes of data available in HDFS using Non- MapReduce applications such as MPI, GIRAPH 30
  • 31. Hadoop 2.0 features • Resource Manager – splits up the two major functionalities of overburdened JobTracker (resource management and job scheduling/monitoring) into two separate daemons: a global Resource Manager and per- application ApplicationMaster • Capacity Scheduler • Data Snapshot • Support for Windows 31
  • 32. Namenode high availability • Hadoop 1.x, NameNode was single point of failure • Hadoop Administrators need to manually recover the NameNode using Secondary NameNode. • Hadoop 2.0 Architecture supports multiple NameNodes to remove this bottleneck • Passive Standby NameNode support. • In case of Active NameNode failure, the passive NameNode becomes the Active NameNode and starts writing to the shared storage 32
  • 33. YARN(Yet Another Resource Negotiator) • Main idea is splitting the JobTracker responsibility of resource management and Job scheduling into separate daemons. 33
  • 34. YARN daemons 1. Global resource manager: a) Scheduler(allocation of resources among various running applications) b) Application manager(Accepting job submission, restarting application master in case of failure) 34
  • 35. YARN daemons 2. Node manager: • Pre machine slave daemon • Launching application container for application execution • Report usage of resources to the global resource manager 35
  • 36. YARN daemons 3. Application master: • Application specific entity • Negotiate required resources for execution from the resource manager • Works with node manager for executing and monitoring component tasks 36
  • 38. YARN workflow 1. Client submits an application 2. The Resource Manager allocates a container to start the Application Manager 3. The Application Manager registers itself with the Resource Manager 4. The Application Manager negotiates containers from the Resource Manager 5. The Application Manager notifies the Node Manager to launch containers 6. Application code is executed in the container 7. Client contacts Resource Manager/Application Manager to monitor application’s status 8. Once the processing is complete, the Application Manager un- registers with the Resource Manager 38
  • 39. 39