SlideShare a Scribd company logo
1 of 22
Download to read offline
MapReduce
@shot6
Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chukwa	
  


 Map                                       Zoo
                         HDFS	
  
Reduce	
                                  Keeper	
  

                   Core	
  
Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chukwa	
  


 Map                                       Zoo
                         HDFS	
  
Reduce	
                                  Keeper	
  

                   Core	
  
•                 MapReduce

     –    Mapper/Reducer
• 
MapReduce                      	
•              WordCount
• 
• 
     – Mapper/Reducer       Job   ⾏行行
     – InputFormat/OutputFormat         ⽅方
     – HDFS(FileSystem)
     –     Writable     ⽅方
WordCount	
•  Hadoop          Hello World
•                   API
   (org.apache.hadoop.mapreduce)
•  API
Grep	
•  grep
  – grepJob/sortJob 2
        ⾏行行
  – JobConf/Mapper/Reducer            ⽅方
  – Mapper RegexMapper     ⾏行行   <Text,
    Long> SequenceFileFormat
  – sortJob
  –                                ⼒力力
  – 
Grep
                  -	
•  JobConf
•  Mapper
•  Reducer
o.a.hadoop.mapred.JobConf	
• 
     –           mapred-default.xml
     –    conf/mapred-site.xml
     – XML    ⾝身
       DOM
     – ⾃自        ⽬目    ⼿手
     –  ⼦子
       •  JobConf child = new JobConf(   Conf,   jar
                                 );
mapred-site.xml	
<configuration>
<!–                 -->
<property>
 <key>mapred.job.tracker</key>
 <value>your-site:9001</value>
</property>
</configuration>
o.a.hadoop.mapred.Mapper	
•  Mapper
•  InputSplit    Mapper
•  MapTask/MapRunner
•  map(KEY, VALUE, COLLECTOR,
   REPORTER)
     – KEY:Map      VALUE:Map
     – COLLECTOR:
     – REPORTER:                     API
•         MapReduceBase
o.a.hadoop.mapred.MapTask	
•  Map
•  initiazlize              (Task Reducer    )
  –                                     ⽣生
  –               (o.a.h.mapred.TaskStatus.State)
       •  RUNNING, SUCCEEDED, FAILED, UNASSIGNED,
          KILLED, COMMIT_PENDING, FAILED_UNCLEAN,
          KILLED_UNCLEAN
  – OutputCommiter ⽣生
       •  Task        ⼒力力            ⾏行行
       •                                         ⼒力力
          – mapred.work.output.dir
o.a.h.mapred.MapTask cont	
•  run        runOldMapper
•  JobClient
   InputSplit
•  RecordReader
o.a.h.mapred.MapTask cont2	
•  Reduce
  –              spill                   (*            )
       •  $mapred.local.dir/taskTracker/jobcache/$
          {taskid}/output/spill${spillNumber}.out
  – Reducer
                 ⼒力力
       •  Combiner        min.num.spills.for.combine
                          combiner
  –              RecordWriter                 ⼒力力
•  MapRunner
o.a.h.mapred.MapRunner	
•  MapRunnable
  – mapred.map.runner.class
  – Hadoop
    PipeMapRunner
  –               Map
    MultiThreadedMapRunner
o.a.h.mapred.MapRunner
                cont	
•  run(RecordReader, OutputCollector,
   Reporter)
     – RecordReader: InputFormat Split
         Reader(InputFormat/RecordReader
                           )
• 
     – RecordReader
     – 
                ⾝身
     – 
MapTask	
      MapRunner	
              Mapper	
         Record            Output
                                                         Reader	
          Collector	
       Input
      Split⽣生 	
  
                                                          	
                                                                                   	

                                                                             Spill
              & run	
                            createKey()                SpillThread
                                                 createValue()	
                    	
  

                                                 next(key, value)	

            EOF         	
     Map(key, value,
                                                                           Spill
                               outputCollector, reporter)
m(_ _)m
•  Mapper
     – JobConf
     – Mapper/MapRunner/MapTask
• 
     – Reducer
       •  Reducer   ⾏行行
       •  Reducer                 ⾏行行
     – InputFormat/RecordReader
o.a.h.mapred.Reducer	
•  Reducer
•  InputSplit      Mapper
•  ReduceTask/ReduceRunner
•  reduce(KEY, Iterator<VALUE>,
   COLLECTOR, REPORTER)
     – KEY:    Iterator<VALUE>:
     – COLLECTOR:
     – REPORTER:                       API
•         MapReduceBase
o.a.h.mapred.ReduceTask	
•  SHUFFLE
•  ReduceTask.ReduceCopier
  – fetchOutputs(            Merger.MergeQueue)
    •  Map                            x   mapred.reduce.parallel.copies

         – MapOutputCopier
    •  Map
          ⾏行行 LocalFSMerger
    •                  ⾏行行 InMemFSMergeThread
    •  GetMapEventsThread
         – Map
         – <     , MapOutputLocation(taskId, host, httpUrl)>
    •    ⼀一 TaskTracker                                         ⼯工
o.a.h.mapred.ReduceTask	
•  run(RecordReader, OutputCollector,
   Reporter)
•  SORT
  – Memory, disk                        ⽣生
    •  RowKeyValueItetator
  – Reducer ⽣生
  – RecordWriter ⽣生
  – ReduceValuesIterator       ⾏行行

More Related Content

What's hot

Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018Piotr Wikiel
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat SheetHortonworks
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkThoughtWorks
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドTatsuya Sasaki
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Infrastructure as Code with Terraform
Infrastructure as Code with TerraformInfrastructure as Code with Terraform
Infrastructure as Code with TerraformMario IC
 
Lua: the world's most infuriating language
Lua: the world's most infuriating languageLua: the world's most infuriating language
Lua: the world's most infuriating languagejgrahamc
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Groupgethue
 
Build your own_map_by_yourself
Build your own_map_by_yourselfBuild your own_map_by_yourself
Build your own_map_by_yourselfMarc Huang
 
REST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU SorocabaREST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU SorocabaLucas Renan
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamZheng Shao
 

What's hot (20)

Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Hive commands
Hive commandsHive commands
Hive commands
 
Sql cheat sheet
Sql cheat sheetSql cheat sheet
Sql cheat sheet
 
Shark - Lab Assignment
Shark - Lab AssignmentShark - Lab Assignment
Shark - Lab Assignment
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Infrastructure as Code with Terraform
Infrastructure as Code with TerraformInfrastructure as Code with Terraform
Infrastructure as Code with Terraform
 
Lua: the world's most infuriating language
Lua: the world's most infuriating languageLua: the world's most infuriating language
Lua: the world's most infuriating language
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Group
 
Build your own_map_by_yourself
Build your own_map_by_yourselfBuild your own_map_by_yourself
Build your own_map_by_yourself
 
REST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU SorocabaREST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU Sorocaba
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Using spaces (Drupal)
Using spaces (Drupal)Using spaces (Drupal)
Using spaces (Drupal)
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
 
What's New In JDK 10
What's New In JDK 10What's New In JDK 10
What's New In JDK 10
 

Similar to サンプルから見るMap reduceコード

Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and PipesHanborq Inc.
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionSubhas Kumar Ghosh
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & StorageIlayaraja P
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloudrhatr
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Hadoop M/R Pig Hive
Hadoop M/R Pig HiveHadoop M/R Pig Hive
Hadoop M/R Pig Hivezahid-mian
 

Similar to サンプルから見るMap reduceコード (20)

Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
mapreduce ppt.ppt
mapreduce ppt.pptmapreduce ppt.ppt
mapreduce ppt.ppt
 
L3.fa14.ppt
L3.fa14.pptL3.fa14.ppt
L3.fa14.ppt
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & Storage
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloud
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Hadoop M/R Pig Hive
Hadoop M/R Pig HiveHadoop M/R Pig Hive
Hadoop M/R Pig Hive
 

More from Shinpei Ohtani

AWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayAWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayShinpei Ohtani
 
ECS for Docker Meetup #4
ECS for Docker Meetup #4ECS for Docker Meetup #4
ECS for Docker Meetup #4Shinpei Ohtani
 
JVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual TalkJVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual TalkShinpei Ohtani
 
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来Shinpei Ohtani
 
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 FallAmazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 FallShinpei Ohtani
 
プログラマブルクラウドの薦め
プログラマブルクラウドの薦めプログラマブルクラウドの薦め
プログラマブルクラウドの薦めShinpei Ohtani
 
サンプルから見るMapReduceコード
サンプルから見るMapReduceコードサンプルから見るMapReduceコード
サンプルから見るMapReduceコードShinpei Ohtani
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダShinpei Ohtani
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダShinpei Ohtani
 
Struts2を始めよう!
Struts2を始めよう!Struts2を始めよう!
Struts2を始めよう!Shinpei Ohtani
 

More from Shinpei Ohtani (17)

Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
AWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayAWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API Gateway
 
ECS for Docker Meetup #4
ECS for Docker Meetup #4ECS for Docker Meetup #4
ECS for Docker Meetup #4
 
JVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual TalkJVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual Talk
 
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
 
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 FallAmazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
 
プログラマブルクラウドの薦め
プログラマブルクラウドの薦めプログラマブルクラウドの薦め
プログラマブルクラウドの薦め
 
サンプルから見るMapReduceコード
サンプルから見るMapReduceコードサンプルから見るMapReduceコード
サンプルから見るMapReduceコード
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダ
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダ
 
はやわかりHadoop
はやわかりHadoopはやわかりHadoop
はやわかりHadoop
 
T2 Web Framework
T2 Web FrameworkT2 Web Framework
T2 Web Framework
 
T2 Hacks
T2 HacksT2 Hacks
T2 Hacks
 
T2 webframework
T2 webframeworkT2 webframework
T2 webframework
 
Struts2を始めよう!
Struts2を始めよう!Struts2を始めよう!
Struts2を始めよう!
 
Struts2 in a nutshell
Struts2 in a nutshellStruts2 in a nutshell
Struts2 in a nutshell
 
ASP.NET MVC 1.0
ASP.NET MVC 1.0ASP.NET MVC 1.0
ASP.NET MVC 1.0
 

Recently uploaded

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

サンプルから見るMap reduceコード

  • 2. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  • 3. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  • 4. •  MapReduce –  Mapper/Reducer • 
  • 5. MapReduce •  WordCount •  •  – Mapper/Reducer Job ⾏行行 – InputFormat/OutputFormat ⽅方 – HDFS(FileSystem) –  Writable ⽅方
  • 6. WordCount •  Hadoop Hello World •  API (org.apache.hadoop.mapreduce) •  API
  • 7. Grep •  grep – grepJob/sortJob 2 ⾏行行 – JobConf/Mapper/Reducer ⽅方 – Mapper RegexMapper ⾏行行 <Text, Long> SequenceFileFormat – sortJob –  ⼒力力 – 
  • 8. Grep - •  JobConf •  Mapper •  Reducer
  • 9. o.a.hadoop.mapred.JobConf •  –  mapred-default.xml –  conf/mapred-site.xml – XML ⾝身 DOM – ⾃自 ⽬目 ⼿手 –  ⼦子 •  JobConf child = new JobConf( Conf, jar );
  • 10. mapred-site.xml <configuration> <!– --> <property> <key>mapred.job.tracker</key> <value>your-site:9001</value> </property> </configuration>
  • 11. o.a.hadoop.mapred.Mapper •  Mapper •  InputSplit Mapper •  MapTask/MapRunner •  map(KEY, VALUE, COLLECTOR, REPORTER) – KEY:Map VALUE:Map – COLLECTOR: – REPORTER: API •  MapReduceBase
  • 12. o.a.hadoop.mapred.MapTask •  Map •  initiazlize (Task Reducer ) –  ⽣生 –  (o.a.h.mapred.TaskStatus.State) •  RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING, FAILED_UNCLEAN, KILLED_UNCLEAN – OutputCommiter ⽣生 •  Task ⼒力力 ⾏行行 •  ⼒力力 – mapred.work.output.dir
  • 13. o.a.h.mapred.MapTask cont •  run runOldMapper •  JobClient InputSplit •  RecordReader
  • 14. o.a.h.mapred.MapTask cont2 •  Reduce –  spill (* ) •  $mapred.local.dir/taskTracker/jobcache/$ {taskid}/output/spill${spillNumber}.out – Reducer ⼒力力 •  Combiner min.num.spills.for.combine combiner –  RecordWriter ⼒力力 •  MapRunner
  • 15. o.a.h.mapred.MapRunner •  MapRunnable – mapred.map.runner.class – Hadoop PipeMapRunner –  Map MultiThreadedMapRunner
  • 16. o.a.h.mapred.MapRunner cont •  run(RecordReader, OutputCollector, Reporter) – RecordReader: InputFormat Split Reader(InputFormat/RecordReader ) •  – RecordReader –  ⾝身 – 
  • 17. MapTask MapRunner Mapper Record Output Reader Collector Input Split⽣生   Spill & run createKey() SpillThread createValue()   next(key, value) EOF   Map(key, value, Spill outputCollector, reporter)
  • 19. •  Mapper – JobConf – Mapper/MapRunner/MapTask •  – Reducer •  Reducer ⾏行行 •  Reducer ⾏行行 – InputFormat/RecordReader
  • 20. o.a.h.mapred.Reducer •  Reducer •  InputSplit Mapper •  ReduceTask/ReduceRunner •  reduce(KEY, Iterator<VALUE>, COLLECTOR, REPORTER) – KEY: Iterator<VALUE>: – COLLECTOR: – REPORTER: API •  MapReduceBase
  • 21. o.a.h.mapred.ReduceTask •  SHUFFLE •  ReduceTask.ReduceCopier – fetchOutputs( Merger.MergeQueue) •  Map x mapred.reduce.parallel.copies – MapOutputCopier •  Map ⾏行行 LocalFSMerger •  ⾏行行 InMemFSMergeThread •  GetMapEventsThread – Map – < , MapOutputLocation(taskId, host, httpUrl)> •  ⼀一 TaskTracker ⼯工
  • 22. o.a.h.mapred.ReduceTask •  run(RecordReader, OutputCollector, Reporter) •  SORT – Memory, disk ⽣生 •  RowKeyValueItetator – Reducer ⽣生 – RecordWriter ⽣生 – ReduceValuesIterator ⾏行行