More Related Content
Similar to サンプルから見るMap reduceコード
Similar to サンプルから見るMap reduceコード (20)
More from Shinpei Ohtani (17)
サンプルから見るMap reduceコード
- 2. Cloudera
Avro
Sqoop
Desktop
Pig
Hive
HBase
Chukwa
Map Zoo
HDFS
Reduce
Keeper
Core
- 3. Cloudera
Avro
Sqoop
Desktop
Pig
Hive
HBase
Chukwa
Map Zoo
HDFS
Reduce
Keeper
Core
- 4. • MapReduce
– Mapper/Reducer
•
- 5. MapReduce
• WordCount
•
•
– Mapper/Reducer Job ⾏行行
– InputFormat/OutputFormat ⽅方
– HDFS(FileSystem)
– Writable ⽅方
- 7. Grep
• grep
– grepJob/sortJob 2
⾏行行
– JobConf/Mapper/Reducer ⽅方
– Mapper RegexMapper ⾏行行 <Text,
Long> SequenceFileFormat
– sortJob
– ⼒力力
–
- 8. Grep
-
• JobConf
• Mapper
• Reducer
- 9. o.a.hadoop.mapred.JobConf
•
– mapred-default.xml
– conf/mapred-site.xml
– XML ⾝身
DOM
– ⾃自 ⽬目 ⼿手
– ⼦子
• JobConf child = new JobConf( Conf, jar
);
- 12. o.a.hadoop.mapred.MapTask
• Map
• initiazlize (Task Reducer )
– ⽣生
– (o.a.h.mapred.TaskStatus.State)
• RUNNING, SUCCEEDED, FAILED, UNASSIGNED,
KILLED, COMMIT_PENDING, FAILED_UNCLEAN,
KILLED_UNCLEAN
– OutputCommiter ⽣生
• Task ⼒力力 ⾏行行
• ⼒力力
– mapred.work.output.dir
- 14. o.a.h.mapred.MapTask cont2
• Reduce
– spill (* )
• $mapred.local.dir/taskTracker/jobcache/$
{taskid}/output/spill${spillNumber}.out
– Reducer
⼒力力
• Combiner min.num.spills.for.combine
combiner
– RecordWriter ⼒力力
• MapRunner
- 16. o.a.h.mapred.MapRunner
cont
• run(RecordReader, OutputCollector,
Reporter)
– RecordReader: InputFormat Split
Reader(InputFormat/RecordReader
)
•
– RecordReader
–
⾝身
–
- 17. MapTask
MapRunner
Mapper
Record Output
Reader
Collector
Input
Split⽣生
Spill
& run
createKey() SpillThread
createValue()
next(key, value)
EOF
Map(key, value,
Spill
outputCollector, reporter)
- 19. • Mapper
– JobConf
– Mapper/MapRunner/MapTask
•
– Reducer
• Reducer ⾏行行
• Reducer ⾏行行
– InputFormat/RecordReader
- 21. o.a.h.mapred.ReduceTask
• SHUFFLE
• ReduceTask.ReduceCopier
– fetchOutputs( Merger.MergeQueue)
• Map x mapred.reduce.parallel.copies
– MapOutputCopier
• Map
⾏行行 LocalFSMerger
• ⾏行行 InMemFSMergeThread
• GetMapEventsThread
– Map
– < , MapOutputLocation(taskId, host, httpUrl)>
• ⼀一 TaskTracker ⼯工