Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Â
Apache Hadoop YARN - Enabling Next Generation Data Applications
1. Š Hortonworks Inc. 2013
Apache Hadoop YARN
Enabling next generation data applications
Page 1
Arun C. Murthy
Founder & Architect
@acmurthy (@hortonworks)
2. Š Hortonworks Inc. 2013
Hello!
â˘âŻFounder/Architect at Hortonworks Inc.
ââŻLead - Map-Reduce/YARN/Tez
ââŻFormerly, Architect Hadoop MapReduce, Yahoo
ââŻResponsible for running Hadoop MapReduce as a service for all of
Yahoo (~50k nodes footprint)
â˘âŻApache Hadoop, ASF
ââŻFrmr. VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC)
ââŻLong-term Committer/PMC member (full time >6 years)
ââŻRelease Manager for hadoop-2.x
Page 2
3. Š Hortonworks Inc. 2013
Agenda
â˘âŻWhy YARN?
â˘âŻYARN Architecture and Concepts
â˘âŻBuilding applications on YARN
â˘âŻNext Steps
Page 3
4. Š Hortonworks Inc. 2013
Agenda
â˘âŻWhy YARN?
â˘âŻYARN Architecture and Concepts
â˘âŻBuilding applications on YARN
â˘âŻNext Steps
Page 4
5. Š Hortonworks Inc. 2013
The 1st Generation of Hadoop: Batch
HADOOP 1.0
Built for Web-Scale Batch Apps
Single
 App
Â
BATCH
HDFS
Single
 App
Â
INTERACTIVE
Single
 App
Â
BATCH
HDFS
â˘âŻ All other usage
patterns must
leverage that same
infrastructure
â˘âŻ Forces the creation
of silos for managing
mixed workloads
Single
 App
Â
BATCH
HDFS
Single
 App
Â
ONLINE
7. Š Hortonworks Inc. 2013
MapReduce Classic: Limitations
â˘âŻScalability
ââŻMaximum Cluster size â 4,000 nodes
ââŻMaximum concurrent tasks â 40,000
ââŻCoarse synchronization in JobTracker
â˘âŻAvailability
ââŻFailure kills all queued and running jobs
â˘âŻHard partition of resources into map and reduce slots
ââŻLow resource utilization
â˘âŻLacks support for alternate paradigms and services
ââŻIterative applications implemented using MapReduce are
10x slower
Page 7
8. Š Hortonworks Inc. 2012Š Hortonworks Inc. 2013. Confidential and Proprietary.
Our Vision: Hadoop as Next-Gen Platform
HADOOP 1.0
HDFS
Â
(redundant,
 reliable
 storage)
Â
MapReduce
Â
(cluster
 resource
 management
Â
 &
 data
 processing)
Â
HDFS2
Â
(redundant,
 reliable
 storage)
Â
YARN
Â
(cluster
 resource
 management)
Â
MapReduce
Â
(data
 processing)
Â
Others
Â
(data
 processing)
Â
HADOOP 2.0
Single Use System
Batch Apps
Multi Purpose Platform
Batch, Interactive, Online, Streaming, âŚ
Page 8
9. Š Hortonworks Inc. 2013
YARN: Taking Hadoop Beyond Batch
Page 9
Applica;ons
 Run
 Na;vely
 IN
 Hadoop
Â
HDFS2
 (Redundant,
 Reliable
 Storage)
Â
YARN
 (Cluster
 Resource
 Management)
Â
Â
Â
BATCH
Â
(MapReduce)
Â
INTERACTIVE
Â
(Tez)
Â
STREAMING
Â
(Storm,
 S4,âŚ)
Â
GRAPH
Â
(Giraph)
Â
IN-ÂâMEMORY
Â
(Spark)
Â
HPC
 MPI
Â
(OpenMPI)
Â
ONLINE
Â
(HBase)
Â
OTHER
Â
(Search)
Â
(WeaveâŚ)
Â
Store ALL DATA in one placeâŚ
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
10. Š Hortonworks Inc. 2013
5
5 Key Benefits of YARN
1.⯠Scale
2.⯠New Programming Models &
Services
3.⯠Improved cluster utilization
4.⯠Agility
5.⯠Beyond Java
Page 10
11. Š Hortonworks Inc. 2013
Agenda
â˘âŻWhy YARN
â˘âŻYARN Architecture and Concepts
â˘âŻBuilding applications on YARN
â˘âŻNext Steps
Page 11
12. Š Hortonworks Inc. 2013
A Brief History of YARN
â˘âŻ Originally conceived & architected by the team at Yahoo!
ââŻArun Murthy created the original JIRA in 2008 and led the PMC
â˘âŻ The team at Hortonworks has been working on YARN for 4 years
â˘âŻ YARN based architecture running at scale at Yahoo!
ââŻDeployed on 35,000 nodes for 6+ months
â˘âŻ Multitude of YARN applications
Page 12
13. Š Hortonworks Inc. 2013
Concepts
â˘âŻApplication
ââŻApplication is a job submitted to the framework
ââŻExample â Map Reduce Job
â˘âŻContainer
ââŻBasic unit of allocation
ââŻFine-grained resource allocation across multiple resource
types (memory, cpu, disk, network, gpu etc.)
ââŻcontainer_0 = 2GB, 1CPU
ââŻcontainer_1 = 1GB, 6 CPU
ââŻReplaces the fixed map/reduce slots
13
14. Š Hortonworks Inc. 2012
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
Container
 1.1
Â
Container
 2.4
Â
ResourceManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
Container
 1.2
Â
Container
 1.3
Â
AM
 1
Â
Container
 2.1
Â
Container
 2.2
Â
Container
 2.3
Â
AM2
Â
YARN Architecture
Scheduler
Â
15. Š Hortonworks Inc. 2013
Architecture
â˘âŻResource Manager
ââŻGlobal resource scheduler
ââŻHierarchical queues
â˘âŻNode Manager
ââŻPer-machine agent
ââŻManages the life-cycle of container
ââŻContainer resource monitoring
â˘âŻApplication Master
ââŻPer-application
ââŻManages application scheduling and task execution
ââŻE.g. MapReduce Application Master
15
16. Š Hortonworks Inc. 2013
Design Centre
â˘âŻSplit up the two major functions of JobTracker
ââŻCluster resource management
ââŻApplication life-cycle management
â˘âŻMapReduce becomes user-land library
16
17. Š Hortonworks Inc. 2012
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
Container
 1.1
Â
Container
 2.4
Â
ResourceManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
Container
 1.2
Â
Container
 1.3
Â
AM
 1
Â
Container
 2.2
Â
Container
 2.1
Â
Container
 2.3
Â
AM2
Â
YARN Architecture - Walkthrough
Scheduler
 Client2
Â
18. Š Hortonworks Inc. 2013
5
Review - Benefits of YARN
1.⯠Scale
2.⯠New Programming Models &
Services
3.⯠Improved cluster utilization
4.⯠Agility
5.⯠Beyond Java
Page 18
19. Š Hortonworks Inc. 2013
Agenda
â˘âŻWhy YARN
â˘âŻYARN Architecture and Concepts
â˘âŻBuilding applications on YARN
â˘âŻNext Steps
Page 19
20. Š Hortonworks Inc. 2013
YARN Applications
â˘âŻData processing applications and services
ââŻOnline Serving â HOYA (HBase on YARN)
ââŻReal-time event processing â Storm, S4, other commercial
platforms
ââŻTez â Generic framework to run a complex DAG
ââŻMPI: OpenMPI, MPICH2
ââŻMaster-Worker
ââŻMachine Learning: Spark
ââŻGraph processing: Giraph
ââŻEnabled by allowing the use of paradigm-specific application
master
Run all on the same Hadoop cluster!
Page 20
21. Š Hortonworks Inc. 2013
YARN â Implementing Applications
â˘âŻWhat APIs do I need to use?
ââŻOnly three protocols
ââŻClient to ResourceManager
ââŻApplication submission
ââŻApplicationMaster to ResourceManager
ââŻContainer allocation
ââŻApplicationMaster to NodeManager
ââŻContainer launch
ââŻUse client libraries for all 3 actions
ââŻModule yarn-client
ââŻProvides both synchronous and asynchronous libraries
ââŻUse 3rd party like Weave
â⯠http://continuuity.github.io/weave/
21
22. Š Hortonworks Inc. 2013
YARN â Implementing Applications
â˘âŻWhat do I need to do?
ââŻWrite a submission Client
ââŻWrite an ApplicationMaster (well copy-paste)
ââŻDistributedShell is the new WordCount!
ââŻGet containers, run whatever you want!
22
23. Š Hortonworks Inc. 2013
YARN â Implementing Applications
â˘âŻWhat else do I need to know?
ââŻResource Allocation & Usage
ââŻResourceRequest !
ââŻContainer !
ââŻContainerLaunchContext!
ââŻLocalResource!
ââŻApplicationMaster
ââŻApplicationId
ââŻApplicationAttemptId!
ââŻApplicationSubmissionContext!
23
24. Š Hortonworks Inc. 2013
YARN â Resource Allocation & Usage!
â˘âŻResourceRequest!
ââŻFine-grained resource ask to the ResourceManager
ââŻAsk for a specific amount of resources (memory, cpu etc.) on a
specific machine or rack
ââŻUse special value of * for resource name for any machine
Page 24
ResourceRequest!
priority!
resourceName!
capability!
numContainers!
26. Š Hortonworks Inc. 2013
YARN â Resource Allocation & Usage!
â˘âŻContainer!
ââŻThe basic unit of allocation in YARN
ââŻThe result of the ResourceRequest provided by
ResourceManager to the ApplicationMaster
ââŻA specific amount of resources (cpu, memory etc.) on a specific
machine
Page 26
Container!
containerId!
resourceName!
capability!
tokens!
27. Š Hortonworks Inc. 2013
YARN â Resource Allocation & Usage!
â˘âŻContainerLaunchContext!
ââŻThe context provided by ApplicationMaster to NodeManager to
launch the Container!
ââŻComplete specification for a process
ââŻLocalResource used to specify container binary and
dependencies
ââŻNodeManager responsible for downloading from shared namespace
(typically HDFS)
Page 27
ContainerLaunchContext!
container!
commands!
environment!
localResources! LocalResource!
uri!
type!
28. Š Hortonworks Inc. 2013
YARN - ApplicationMaster
â˘âŻApplicationMaster
ââŻPer-application controller aka container_0!
ââŻParent for all containers of the application
ââŻApplicationMaster negotiates all itâs containers from
ResourceManager
ââŻApplicationMaster container is child of ResourceManager
ââŻThink init process in Unix
ââŻRM restarts the ApplicationMaster attempt if required (unique
ApplicationAttemptId)
ââŻCode for application is submitted along with Application itself
Page 28
29. Š Hortonworks Inc. 2013
YARN - ApplicationMaster
â˘âŻApplicationMaster
ââŻApplicationSubmissionContext is the complete
specification of the ApplicationMaster, provided by Client
ââŻResourceManager responsible for allocating and launching
ApplicationMaster container
Page 29
ApplicationSubmissionContext!
resourceRequest!
containerLaunchContext!
appName!
queue!
30. Š Hortonworks Inc. 2013
YARN Application API - Overview
â˘âŻhadoop-yarn-client module
â˘âŻYarnClient is submission client api
â˘âŻBoth synchronous & asynchronous APIs for resource
allocation and container start/stop
â˘âŻSynchronous API
ââŻAMRMClient!
ââŻAMNMClient!
â˘âŻAsynchronous API
ââŻAMRMClientAsync!
ââŻAMNMClientAsync!
Page 30
31. Š Hortonworks Inc. 2012
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
Container
 1.1
Â
Container
 2.4
Â
ResourceManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
Container
 1.2
Â
Container
 1.3
Â
AM
 1
Â
Container
 2.2
Â
Container
 2.1
Â
Container
 2.3
Â
AM2
Â
Client2
Â
Â
New Application Request:
YarnClient.createApplication!
Submit Application:
YarnClient.submitApplication!
1
2
YARN Application API â The Client
Scheduler
Â
32. Š Hortonworks Inc. 2013
YARN Application API â The Client
â˘âŻYarnClient!
ââŻcreateApplication to create application
ââŻsubmitApplication to start application
â⯠Application developer needs to provide
ApplicationSubmissionContext!
ââŻAPIs to get other information from ResourceManager
â⯠getAllQueues!
â⯠getApplications!
â⯠getNodeReports!
ââŻAPIs to manipulate submitted application e.g. killApplication!
Page 32
33. Š Hortonworks Inc. 2012
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
YARN Application API â Resource Allocation
ResourceManager
Â
NodeManager
 NodeManager
 NodeManager
Â
AM
Â
registerApplicationMaster!
1
4
AMRMClient.allocate!
Container!
2
3
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
unregisterApplicationMaster!
Scheduler
Â
34. Š Hortonworks Inc. 2013
YARN Application API â Resource Allocation
â˘âŻAMRMClient - Synchronous API for ApplicationMaster
to interact with ResourceManager
ââŻPrologue / epilogue â registerApplicationMaster /
unregisterApplicationMaster
ââŻResource negotiation with ResourceManager
â⯠Internal book-keeping - addContainerRequest /
removeContainerRequest / releaseAssignedContainer!
â⯠Main API â allocate!
ââŻHelper APIs for cluster information
â⯠getAvailableResources!
â⯠getClusterNodeCount!
Page 34
35. Š Hortonworks Inc. 2013
YARN Application API â Resource Allocation
â˘âŻAMRMClientAsync - Asynchronous API for
ApplicationMaster
ââŻExtension of AMRMClient to provide asynchronous
CallbackHandler!
ââŻCallbacks make it easier to build mental model of interaction with
ResourceManager for the application developer
â⯠onContainersAllocated!
â⯠onContainersCompleted!
â⯠onNodesUpdated!
â⯠onError!
â⯠onShutdownRequest!
Page 35
36. Š Hortonworks Inc. 2012
NodeManager
 NodeManager
Â
Â
NodeManager
Â
Â
NodeManager
Â
Â
YARN Application API â Using Resources
Container
 1.1
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
Â
NodeManager
 NodeManager
Â
Â
NodeManager
Â
Â
NodeManager
Â
Â
AM
 1
Â
AMNMClient.startContainer!
AMNMClient.getContainerStatus!
ResourceManager
Â
Scheduler
Â
37. Š Hortonworks Inc. 2013
YARN Application API â Using Resources
â˘âŻAMNMClient - Synchronous API for ApplicationMaster
to launch / stop containers at NodeManager
ââŻSimple (trivial) APIs
â⯠startContainer!
â⯠stopContainer!
â⯠getContainerStatus!
Page 37
38. Š Hortonworks Inc. 2013
YARN Application API â Using Resources
â˘âŻAMNMClient - Asynchronous API for
ApplicationMaster to launch / stop containers at
NodeManager
ââŻSimple (trivial) APIs
â⯠startContainerAsync!
â⯠stopContainerAsync!
â⯠getContainerStatusAsync!
ââŻCallbackHandler to make it easier to build mental model of
interaction with NodeManager for the application developer
â⯠onContainerStarted!
â⯠onContainerStopped!
â⯠onStartContainerError!
â⯠onContainerStatusReceived!
Page 38
39. Š Hortonworks Inc. 2013
YARN Application API - Development
â˘âŻUn-Managed Mode for ApplicationMaster
ââŻRun ApplicationMaster on development machine rather than in-
cluster
ââŻNo submission client
ââŻhadoop-yarn-applications-unmanaged-am-launcher!
ââŻEasier to step through debugger, browse logs etc.!
Page 39
!
$ bin/hadoop jar hadoop-yarn-applications-unmanaged-am-launcher.jar !
Client !
âjar my-application-master.jar !
âcmd âjava MyApplicationMaster <args>â!
40. Š Hortonworks Inc. 2013
Sample YARN Application â DistributedShell
â˘âŻOverview
ââŻYARN application to run n copies for a Shell command
ââŻSimplest example of a YARN application â get n containers and
run a specific Unix command
Page 40
!
$ bin/hadoop jar hadoop-yarn-applications-distributedshell.jar !
org.apache.hadoop.yarn.applications.distributedshell.Client !
âshell_command â/bin/dateâ !
ânum_containers <n>!
Code: https://github.com/hortonworks/simple-yarn-app
41. Š Hortonworks Inc. 2013
Sample YARN Application â DistributedShell
â˘âŻCode Overview
ââŻUser submits application to ResourceManager via
org.apache.hadoop.yarn.applications.distributedsh
ell.Client!
ââŻClient provides ApplicationSubmissionContext to the
ResourceManager
ââŻIt is responsibility of
org.apache.hadoop.yarn.applications.distributedsh
ell.ApplicationMaster to negotiate n containers
ââŻApplicationMaster launches containers with the user-specified
command as ContainerLaunchContext.commands!
Page 41
42. Š Hortonworks Inc. 2013
Sample YARN Application â DistributedShell
â˘âŻClient â Code Walkthrough
ââŻhadoop-yarn-client module
ââŻSteps:
ââŻYarnClient.createApplication!
ââŻSpecify ApplicationSubmissionContext, in particular,
ContainerLaunchContext with commands, and other key pieces
such as resource capability for ApplicationMaster container and
queue, appName, appType etc.
ââŻYarnClient.submitApplication!
Page 42
1âŻ. // Create yarnClient!
2âŻ.  YarnClient yarnClient = YarnClient.createYarnClient();!
3âŻ.  yarnClient.init(new Configuration());!
4âŻ.  yarnClient.start();!
5âŻ. !
6âŻ.   // Create application via yarnClient!
7âŻ. YarnClientApplication app = yarnClient.createApplication();!
8âŻ. !
43. Š Hortonworks Inc. 2013
Sample YARN Application â DistributedShell
Page 43
9. // Set up the container launch context for the application master!
1âŻ0.    ContainerLaunchContext amContainer = !
1âŻ1.        Records.newRecord(ContainerLaunchContext.class);!
1âŻ2.    List<String> command = new List<String>();!
1âŻ3.    commands.add("$JAVA_HOME/bin/java");!
1âŻ4.    commands.add("-Xmx256M");!
1âŻ5.    commands.add(!
1âŻ6.        "org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster");!
1âŻ7.    commands.add("--container_memory 1024â );!
1âŻ8.    commands.add("--container_cores 1â);!
1âŻ9.    commands.add("--num_containers 3");;!
2âŻ0.    amContainer.setCommands(commands);!
2âŻ1. !
2âŻ2.    // Set up resource type requirements for ApplicationMaster!
2âŻ3.    Resource capability = Records.newRecord(Resource.class);!
2âŻ4.    capability.setMemory(256);!
2âŻ5.    capability.setVirtualCores(2);!
2âŻ6. !
â˘âŻClient â Code Walkthrough
Command to launch ApplicationMaster
process
Resources required for
ApplicationMaster container!
44. Š Hortonworks Inc. 2013
Sample YARN Application â DistributedShell
Page 44
2âŻ7.    // Finally, set-up ApplicationSubmissionContext for the application!
2âŻ8.    ApplicationSubmissionContext appContext = !
2âŻ9.        app.getApplicationSubmissionContext();!
3âŻ0.    appContext.setQueue("my-queue");            // queue !
3âŻ1.    appContext.setAMContainerSpec(amContainer);!
3âŻ2.    appContext.setResource(capability);!
3âŻ3.    appContext.setApplicationName("my-app");    // application name!
3âŻ4.    appContext.setApplicationType(âDISTRIBUTED_SHELL");    // application type!
3âŻ5. !
3âŻ6.    // Submit application!
3âŻ7.    yarnClient.submitApplication(appContext);
â˘âŻClient â Code Walkthrough
ApplicationSubmissionContext !
for
ApplicationMaster
Submit application to
ResourceManager
45. Š Hortonworks Inc. 2013
Sample YARN Application â DistributedShell
â˘âŻApplicationMaster â Code Walkthrough
ââŻAgain, hadoop-yarn-client module
ââŻSteps:
ââŻAMRMClient.registerApplication!
ââŻNegotiate containers from ResourceManager by providing
ContainerRequest to AMRMClient.addContainerRequest
ââŻTake the resultant Container returned via subsequent call to
AMRMClient.allocate, build ContainerLaunchContext with
Container and commands, then launch them using
AMNMClient.launchContainer!
ââŻUse LocalResources to specify software/configuration dependencies
for each worker container
ââŻWait till doneâŚ
AllocateResponse.getCompletedContainersStatuses from
subsequent calls to AMRMClient.allocate
ââŻAMRMClient.unregisterApplication!
Page 45
46. Š Hortonworks Inc. 2013
Sample YARN Application â DistributedShell
â˘âŻApplicationMaster â Code Walkthrough
Page 46
1. Â // Initialize clients to ResourceManager and NodeManagers!
2âŻ.        Configuration conf = new Configuration();!
3âŻ. !
4âŻ.        AMRMClient rmClient = AMRMClientAsync.createAMRMClient();!
5âŻ.        rmClient.init(conf);!
6âŻ.        rmClient.start();!
7âŻ. !
8âŻ.        NMClient nmClient = NMClient.createNMClient();!
9âŻ.        nmClientAsync.init(conf);!
1âŻ0.    nmClientAsync.start();!
1âŻ1.!
1âŻ2.    // Register with ResourceManager!
1âŻ3.    rmClient.registerApplicationMaster("", 0, "");!
Initialize clients to
ResourceManager and
NodeManagers
Register with
ResourceManager
47. Š Hortonworks Inc. 2013
Sample YARN Application â DistributedShell
â˘âŻApplicationMaster â Code Walkthrough
Page 47
1âŻ5.    // Priority for worker containers - priorities are intra-application!
1âŻ6.    Priority priority = Records.newRecord(Priority.class);!
1âŻ7.    priority.setPriority(0);      !
1âŻ8. !
1âŻ9.    // Resource requirements for worker containers!
2âŻ0.    Resource capability = Records.newRecord(Resource.class);!
2âŻ1.    capability.setMemory(128);!
2âŻ2.    capability.setVirtualCores(1);!
2âŻ3. !
2âŻ4.    // Make container requests to ResourceManager!
2âŻ5.    for (int i = 0; i < n; ++i) {!
2âŻ6.      ContainerRequest containerAsk = new ContainerRequest(capability, null, null,
priority);!
2âŻ7.      rmClient.addContainerRequest(containerAsk);!
2âŻ8.    }
!
Setup requirements for
worker containers
Make resource requests
to ResourceManager
48. Š Hortonworks Inc. 2013
Sample YARN Application â DistributedShell
â˘âŻApplicationMaster â Code Walkthrough
Page 48
!
3âŻ0.    // Obtain allocated containers and launch !
3âŻ1.    int allocatedContainers = 0;!
3âŻ2.    while (allocatedContainers < n) {!
3âŻ3.      AllocateResponse response = rmClient.allocate(0);!
3âŻ4.      for (Container container : response.getAllocatedContainers()) {!
3âŻ5.        ++allocatedContainers;!
3âŻ6. !
3âŻ7.        // Launch container by create ContainerLaunchContext!
3âŻ8.        ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class);!
3âŻ9.        ctx.setCommands(Collections.singletonList(â/bin/dateâ));!
4âŻ0.        nmClient.startContainer(container, ctx);!
4âŻ1.      }!
4âŻ2.      Thread.sleep(100);!
4âŻ3.    }
!
Setup requirements for
worker containers
Make resource requests
to ResourceManager
49. Š Hortonworks Inc. 2013
Sample YARN Application â DistributedShell
â˘âŻApplicationMaster â Code Walkthrough
Page 49
4âŻÂ    // Now wait for containers to complete!
4âŻÂ    int completedContainers = 0;!
4âŻÂ    while (completedContainers < n) {!
4âŻÂ      AllocateResponse response = rmClient.allocate(completedContainers/n);!
4âŻÂ      for (ContainerStatus status : response.getCompletedContainersStatuses()) {!
5âŻÂ        if (status.getExitStatus() == 0) {!
5âŻÂ          ++completedContainers;!
5âŻÂ        }  !
5âŻÂ      }!
4âŻÂ      Thread.sleep(100);!
5âŻÂ    }!
5âŻÂ !
5âŻÂ    // Un-register with ResourceManager!
5âŻÂ    rmClient.unregisterApplicationMaster(SUCCEEDED, "", "");!
Wait for containers to
complete successfully
Un-register with
ResourceManager
50. Š Hortonworks Inc. 2013
Apache Hadoop MapReduce on YARN
â˘âŻOriginal use-case
â˘âŻMost complex application to build
ââŻData-locality
ââŻFault tolerance
ââŻApplicationMaster recovery: Check point to HDFS
ââŻIntra-application Priorities: Maps v/s Reduces
ââŻNeeded complex feedback mechanism from ResourceManager
ââŻSecurity
ââŻIsolation
â˘âŻBinary compatible with Apache Hadoop 1.x
Page 50
51. Š Hortonworks Inc. 2012
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
map
 1.1
Â
reduce2.1
Â
ResourceManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
map1.2
Â
reduce1.1
Â
MR
 AM
 1
Â
map2.1
Â
map2.2
Â
reduce2.2
Â
MR
 AM2
Â
Apache Hadoop MapReduce on YARN
Scheduler
Â
52. Š Hortonworks Inc. 2013
Apache Tez on YARN
â˘âŻReplaces MapReduce as primitive for Pig, Hive,
Cascading etc.
ââŻSmaller latency for interactive queries
ââŻHigher throughput for batch queries
ââŻhttp://incubator.apache.org/projects/tez.html
Page 52
Hive - Tez
SELECT a.x, AVERAGE(b.y) AS avg
FROM a JOIN b ON (a.id = b.id)
GROUP BY a
UNION SELECT x, AVERAGE(y) AS AVG
FROM c
GROUP BY x
ORDER BY AVG;
Hive - MR
53. Š Hortonworks Inc. 2012
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
map
 1.1
Â
vertex1.2.2
Â
ResourceManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
map1.2
Â
reduce1.1
Â
MR
 AM
 1
Â
vertex1.1.1
Â
vertex1.1.2
Â
vertex1.2.1
Â
Tez
 AM1
Â
Apache Tez on YARN
Scheduler
Â
54. Š Hortonworks Inc. 2013
HOYA - Apache HBase on YARN
â˘âŻHoya â Apache HBase becomes user-level application
â˘âŻUse cases
ââŻSmall HBase cluster in large YARN cluster
ââŻDynamic HBase clusters
ââŻTransient/intermittent clusters for workflows
â˘âŻAPIs to create, start, stop & delete HBase clusters
â˘âŻFlex cluster size: increase/decrease size with load
â˘âŻRecover from Region Server loss with new container.
Page 54
Code: https://github.com/hortonworks/hoya
55. Š Hortonworks Inc. 2012
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
map
 1.1
Â
Hbase
 Master1.1
Â
ResourceManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
map1.2
Â
reduce1.1
Â
MR
 AM
 1
Â
RegionServer
 1.1
Â
RegionServer
 1.2
Â
RegionServer
 1.3
Â
HOYA
 AM1
Â
HOYA - Apache HBase on YARN
Scheduler
Â
56. Š Hortonworks Inc. 2013
HOYA - Highlights
â˘âŻCluster specification stored as JSON in HDFS
â˘âŻConfig directory cached - dynamically patched before
pushing up as local resources for Master &
RegionServers
â˘âŻHBase tar file stored in HDFS -clusters can use the
same/different HBase versions
â˘âŻHandling of cluster flexing is the same code as
unplanned container loss.
â˘âŻNo Hoya code on RegionServers: client and AM only
Page 56
57. Š Hortonworks Inc. 2013
Storm on YARN
â˘âŻAbility to deploy multiple Storm clusters on YARN for
real-time event processing
â˘âŻYahoo â Primary contributor
ââŻ200+ nodes in production
â˘âŻAbility to recover from faulty nodes
ââŻGet new containers
â˘âŻAuto-scale for load balancing
ââŻGet new containers as load increases
ââŻRelease containers as load decreases
Page 57
Code: https://github.com/yahoo/storm-yarn
58. Š Hortonworks Inc. 2012
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
map
 1.1
Â
Nimbus1.4
Â
ResourceManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
NodeManager
 NodeManager
 NodeManager
 NodeManager
Â
map1.2
Â
reduce1.1
Â
MR
 AM
 1
Â
Nimbus1.1
Â
Nimbus1.2
Â
Nimbus1.3
Â
Storm
 AM1
Â
Storm on YARN
Scheduler
Â
59. Š Hortonworks Inc. 2013
General Architectural Considerations
â˘âŻFault Tolerance
ââŻCheckpoint
â˘âŻSecurity
â˘âŻAlways-On services
â˘âŻScheduler features
ââŻWhitelist resources
ââŻBlacklist resources
ââŻLabels for machines
ââŻLicense management
Page 59
60. Š Hortonworks Inc. 2013
Agenda
â˘âŻWhy YARN
â˘âŻYARN Architecture and Concepts
â˘âŻBuilding applications on YARN
â˘âŻNext Steps
Page 60
61. Š Hortonworks Inc. 2013
HDP 2.0 Community Preview & YARN Certification Program
Goal: Accelerate # of certified YARN-based solutions
â˘âŻ HDP 2.0 Community Preview
ââŻContains latest community Beta of
Apache Hadoop 2.0 & YARN
ââŻDelivered as easy to use Sandbox
VM, as well as RPMs and Tarballs
ââŻEnables YARN Cert Program
Community & commercial
ecosystem to test and certify new
and existing YARN-based apps
â˘âŻ YARN Certification Program
ââŻMore than 14 partners in
program at launch
â⯠Splunk*
â⯠Elastic Search*
â⯠Altiscale*
â⯠Concurrent*
â⯠Microsoft
â⯠Platfora
â⯠Tableau
â⯠(IBM) DataStage
â⯠Informatica
â⯠Karmasphere
â⯠and others
* Already certified
62. Š Hortonworks Inc. 2013
Forum & Office Hours
â˘âŻYARN Forum
ââŻCommunity of Hadoop YARN developers
ââŻFocused on collaboration and Q&A
â⯠http://hortonworks.com/community/forums/forum/yarn
â˘âŻOffice Hours
ââŻhttp://www.meetup.com/HSquared-Hadoop-Hortonworks-User-
Group/
ââŻBi-weekly office hours with Hortonworks engineers & architects
ââŻEvery other Thursday, starting on Aug 15th from 4:30 â 5:30 PM
(PST)
ââŻAt Hortonworks Palo Alto HQ
ââŻWest Bayshore Rd. Palo Alto, CA 94303 USA
Page 62
64. Š Hortonworks Inc. 2013
What Next?
â˘âŻDownload the Book
â˘âŻDownload the Community preview
â˘âŻJoin the Beta Program
â˘âŻParticipate via the YARN forum
â˘âŻCome see us during the YARN
Office Hours
â˘âŻFollow us⌠@hortonworks
Thank You!
Page 64
http://hortonworks.com/hadoop/yarn