Problem analysis of MapReduce .
Mapreduce performs poorly in iterative why ?
Hadoop does not function well for random access to its datasets . But YARN promise to support that .
Why Hadoop do not support broadcasting ?
JAVA do not support sharing references during mapping task .
2. Problem Analysis
• Experiments in ICSI Desktop Cluster but in
reality Big Data dataset has to handle 100 pada
byte of data .
• Heavy network traffic is not considered .
3. Problem Analysis
• Mapreduce has latency as
• Mapping phase peak rate is not high .
• Need Bundle data for fast mapping .
• Limited Reducer as each reducer output file
is different .
5. Problem Analysis
• Mapreduce has latency as
• Hadoop do not support broadcasting
parameter references to all maps node thus all
map node has to bundle same parameter .
• Secondary buffer needed to swapping .
6. Problem Analysis
• Hadoop has drawbacks on implementing DFS .
• Mapreduce framework performs very poorly in
slot-base memory(1 slot 1 task) and iterative
processing tasks like graph processing.
• The MapReduce does not work when there are
computational dependencies in the data .
7. Problem Analysis
• To make the implementation of research
suggestion is more non-intuitive & complicated
than is necessary .
• If new data is added the jobs need to run over
the entire set again .
• A single failure kills all queued and running
jobs .
9. Suggestion
• Augmenting MapReduce with ad hoc support
may solve iterative and random access to its
dataset.
• Sampling also may use to solve iterative
problem .
11. Review Questions
• Why Mapping phase peak rate is not high ?
It writes on intermediate data file .
• Why Hadoop do not support broadcasting ?
As JAVA do not support sharing references
during mapping task .
12. Review Questions
• Mapreduce performs poorly in iterative why ?
The system merge iterations and
materializing data only when required .
• Why new data cases to run whole job again ?
Hadoop does not function well for random
access to its datasets . But YARN promise to
support that .