Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Hadoop MapReduce Fundamentals
Next

18

Share

MapReduce basic

MapReduce basic Concepts like what is mapper and what is reducer

Related Books

Free with a 30 day trial from Scribd

See all

MapReduce basic

  1. 1. MapReduce HEART OF HADOOP
  2. 2. What is Map Reduce  MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.  MapReduce programs are written in a particular style inf luenced by functional programming constructs, specifically idioms for processing lists of data.
  3. 3. Map and Reduce  Conceptually, MapReduce programs transform lists of input data elements into lists of output data elements. A MapReduce program will do this twice, using two different list processing idioms: map, and reduce.
  4. 4. Dataflow  A mapreduce job is the a unit of work that the client wants to be performed it consist of the input data and theMapReduce Program and configuration info .  Hadoop runs the job by dividing it into tasks ,of which there are two type :map tasks and the reduce tasks
  5. 5. MapReduce How much data is getting processed at a time
  6. 6. MapReduce
  7. 7. Map Only Job
  8. 8. Combiner  The Combiner is a "mini-reduce" process which operates only on data generated by one machine.  Word count is a prime example for where a Combiner is useful. The Word Count program in listings 1--3 emits a (word, 1) pair for every instance of every word it sees. So if the same document contains the word "cat" 3 times, the pair ("cat", 1) is emitted three times; all of these are then sent to the Reducer.
  9. 9. FAULT TOLERANCE  One of the primary reasons to use Hadoop to run your jobs is due to its high degree of fault tolerance.  The primary way that Hadoop achieves fault tolerance is through restarting tasks. Individual task nodes (TaskTrackers) are in constant communication with the head node of the system, called the JobTracker. If a TaskTracker fails to communicate with the JobTracker for a period of time (by default, 1 minute), the JobTracker will assume that the TaskTracker in question has crashed. The JobTracker knows which map and reduce tasks were assigned to each TaskTracker. .
  10. 10. Hadoop Streaming  Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java. Hadoop Streaming uses Unix standard streams as the interface between Hadoop and your program, so you can use any language that can read standard input and write to standard output to write your MapReduce program.
  • PokalaKrishnaReddy

    Nov. 16, 2020
  • uthrarajan

    Mar. 8, 2019
  • RANICHANDRASUBBIAH

    Apr. 25, 2018
  • mail2kumaroaf

    Sep. 3, 2016
  • VigneshwaranM2

    Apr. 11, 2016
  • Malleswaranu

    Apr. 2, 2016
  • VenkatSudhakar1

    Dec. 27, 2015
  • BalasahebBarde

    Oct. 6, 2015
  • majidhazari

    Jun. 4, 2015
  • firouziam

    May. 30, 2015
  • shanganana

    May. 24, 2015
  • SauravMishra17

    May. 10, 2015
  • lazydonkey

    Feb. 28, 2015
  • ravi_rec5

    Feb. 28, 2015
  • elavs

    Jan. 10, 2015
  • GuangbinZhang

    Jan. 5, 2015
  • bunkertor

    Dec. 31, 2014
  • yinyufeng

    Dec. 5, 2014

MapReduce basic Concepts like what is mapper and what is reducer

Views

Total views

2,367

On Slideshare

0

From embeds

0

Number of embeds

115

Actions

Downloads

0

Shares

0

Comments

0

Likes

18

×