SlideShare a Scribd company logo
1 of 64
Download to read offline
4/27/13
Introduction To YARN, NameNode HA
and HDFS Federation
Adam Kawa, Spotify
4/27/13
About Me
Data Engineer at Spotify, Sweden
Hadoop Instructor at Compendium (Cloudera Training Partner)
+2.5 year of experience in Hadoop
4/27/13
“Classical” Apache Hadoop Cluster
Can consist of one, five, hundred or 4000 nodes
4/27/13
Hadoop Golden EraHadoop Golden Era
One of the most exciting open-source projects on the planetOne of the most exciting open-source projects on the planet
Becomes the standard for large-scale processingBecomes the standard for large-scale processing
Successfully deployed by hundreds/thousands of companiesSuccessfully deployed by hundreds/thousands of companies
Like a salutary virusLike a salutary virus
... But it has some little drawbacks... But it has some little drawbacks
4/27/13
HDFS Main LimitationsHDFS Main Limitations
Single NameNode thatSingle NameNode that
keeps all metadata in RAMkeeps all metadata in RAM
performs all metadata operationsperforms all metadata operations
becomes a single point of failure (SPOF)becomes a single point of failure (SPOF)
4/27/13
HDFS Main ImprovementHDFS Main Improvement
Introduce multiple NameNodes
HDFS Federation
HDFS High Availability (HA)
4/27/13
Motivation for HDFS Federation
Limited namespace scalability
Decreased metadata operations performance
Lack of isolation
4/27/13
HDFS – master and slaves
Master (NameNode) manages all metadata information
Slaves (DataNodes) store blocks of data and serve them to
the client
4/27/13
HDFS NameNode (NN)
Supports all the namespace-related file system operations
Keeps information in RAM for fast lookup
The filesystem tree
Metadata for all files/directories (e.g. ownerships, permissions)
Names and locations of blocks
Metadata (not all) is additionally stored on disk(s)
Filesystem snapshot (fsimage) + editlog (edits) files
4/27/13
HDFS DataNode (DN)
Stores and retrieves blocks
A block is stored as a regular files on local disk
e.g. blk_-992391354910561645
A block itself does not know which file it belongs to
Sends the block report to NN periodically
Sends the heartbeat signals to NN to say that it is alive
4/27/13
NameNode scalability issues
Number of files limited by the amount of RAM of one machine
More RAM means also heavier GC
Stats based on Y! Clusters
An average file ≈ 1.5 blocks (block size = 128 MB)
An average file ≈ 600 bytes of metadata (1 file and 2 blocks
objects)
100M files ≈ 60 GB of metadata
1 GB of metadata ≈ 1 PB physical storage (but usually less)
4/27/13
NameNode performance
Read/write operations throughput limited by one machine
~120K read ops/sec, ~6K write ops/sec
MapReduce tasks are also HDFS clients
Internal load increases as the cluster grows
Block reports and heartbeats from DataNodes
Bigger snapshots transferred from/to Secondary NameNode
NNBench - benchmark that stresses the NameNode
4/27/13
Lack of isolation
Experimental applications
(can) overload the NN and slow down production jobs
Production applications with different requirements
(can) benefit from “own” NameNode
e.g. HBase requires extremely fast responses
4/27/13
HDFS Federation
Partition the filesystem namespace over multiple separated
NameNodes
4/27/13
Multiple independent namespaces
NameNodes does not
talk each other
Datanode can store blocks
managed by any NameNode
NameNode manages only
slice of namespace
4/27/13
The client-view of the cluster
There are multiple namespaces (and corresponding NNs)
Client can use any of them to compose its own view of HDFS
Idea is much like the Linux /etc/fstab file
Mapping between paths and NameNodes
4/27/13
Client-side mount tables
4/27/13
(More funky) Client-side mount tables
4/27/13
Deploying HDFS Federation
Deploying HDFS FederationDeploying HDFS Federation
Useful for small (+isolation) and large (+scalability) clusterUseful for small (+isolation) and large (+scalability) cluster
However not so many clusters use itHowever not so many clusters use it
NameNodes can be added/removed at any timeNameNodes can be added/removed at any time
Cluster does have to be restartedCluster does have to be restarted
Add Federation to the existing clusterAdd Federation to the existing cluster
Do not format existing NNDo not format existing NN
Newly added NN will be “empty”Newly added NN will be “empty”
4/27/13
Motivation for HDFS High Availability
Single NameNode is a single point of failure (SPOF)Single NameNode is a single point of failure (SPOF)
Secondary NN and HDFS Federation do not change thatSecondary NN and HDFS Federation do not change that
Recovery from failed NN may take even tens of minutesRecovery from failed NN may take even tens of minutes
Two types of downtimesTwo types of downtimes
Unexpected failure (infrequent)Unexpected failure (infrequent)
Planned maintenance (common)Planned maintenance (common)
4/27/13
HDFS High Availability
Introduces a pair of redundant NameNodes
One Active and one Standby
If the Active NameNode crashes or is intentionally stopped,
Then the Standby takes over quickly
4/27/13
NameNodes resposibilitiesNameNodes resposibilities
The Active NameNode is responsible for all client operationsThe Active NameNode is responsible for all client operations
The Standby NameNode is watchfulThe Standby NameNode is watchful
Maintains enough state to provide a fast failoverMaintains enough state to provide a fast failover
Does also checkpointing (disappearance of Secondary NN)Does also checkpointing (disappearance of Secondary NN)
4/27/13
Synchronizing the state of metadata
The Standby must keep its state as up-to-date as possible
Edit logs - two alternative ways of sharing them
Shared storage using NFS
Quorum-based storage (recommended)
Block locations
DataNodes send block reports to both NameNodes
4/27/13
Quorum-based storage
4/27/13
Supported failover modes
Manual fail-over
Initiated by administrator by a dedicated command - 0-3
seconds
Automatic fail-over
Detected by a fault in the active NN (more complicated and
longer) - 30-40 seconds
4/27/13
Manual failover
Initiate a failover from the first NN to the second one
hdfs haadmin ­failover <to­be­standby­NN> <to­be­active­NN>
4/27/13
Automatic failover
Requires ZooKeeper and an additional process
“ZKFailoverController”
4/27/13
The split brain scenario
A potential scenario when two NameNodes
Both think they are active
Both make conflicting changes to the namespace
Example:The ZKFC crashes, but its NN is still running
4/27/13
FencingFencing
Ensure that the previous Active NameNode is no longer able toEnsure that the previous Active NameNode is no longer able to
make any changes to the system metadatamake any changes to the system metadata
4/27/13
Fencing with Quorum-based storage
Only a single NN can be a writer to JournalNodes at a time
Whenever NN becomes active, it generated an “epoch
number”
NN that has a higher “epoch number” can be a writer
This prevents from corrupting the file system metadata
4/27/13
“Read” requests fencing
Previous active NN will be fenced when tries to write to the JN
Until it happens, it may still serve HDFS read requests
Try to fence this NN before that happens
This prevents from reading outdated metadata
Might be done automatically by ZKFC or hdfs haadmin
4/27/13
HDFS Highly-Available Federated Cluster
Any combination is possible
Federation without HA
HA without Federation
HA with Federation
e.g. NameNode HA for HBase, but not for the others
4/27/13
“Classical” MapReduce Limitations
Limited scalability
Poor resource utilization
Lack of support for the alternative frameworks
Lack of wire-compatible protocols
4/27/13
Limited ScalabilityLimited Scalability
Scales to onlyScales to only
~4,000-node cluster~4,000-node cluster
~40,000 concurrent tasks~40,000 concurrent tasks
Smaller and less-powerful clusters must be createdSmaller and less-powerful clusters must be created
Image source: http://www.flickr.com/photos/rsms/660332848/sizes/l/in/set-72157607427138760/Image source: http://www.flickr.com/photos/rsms/660332848/sizes/l/in/set-72157607427138760/
4/27/13
Poor Cluster Resources Utilization
Hard-coded values. TaskTrackers
must be restarted after a change.
4/27/13
Poor Cluster Resources Utilization
Map tasks are waiting for the slots
which are
NOT currently used by Reduce tasks
Hard-coded values. TaskTrackers
must be restarted after a change.
4/27/13
Resources Needs That Varies Over Time
How to
hard-code the
number of map
and reduce slots
efficiently?
4/27/13
(Secondary) Benefits Of Better Utilization(Secondary) Benefits Of Better Utilization
The same calculation on more efficient Hadoop clusterThe same calculation on more efficient Hadoop cluster
Less data center spaceLess data center space
Less silicon wasteLess silicon waste
Less power utilizationsLess power utilizations
Less carbon emissionsLess carbon emissions
==
Less disastrous environmental consequencesLess disastrous environmental consequences
4/27/13
“Classical” MapReduce Daemons
4/27/13
JobTracker ResponsibilitiesJobTracker Responsibilities
Manages the computational resources (map and reduce slots)Manages the computational resources (map and reduce slots)
Schedules all user jobsSchedules all user jobs
Schedules all tasks that belongs to a jobSchedules all tasks that belongs to a job
Monitors tasks executionsMonitors tasks executions
Restarts failed and speculatively runs slow tasksRestarts failed and speculatively runs slow tasks
Calculates job counters totalsCalculates job counters totals
4/27/13
Simple Observation
JobTracker does a lot...
4/27/13
JobTracker Redesign Idea
Reduce responsibilities of JobTracker
Separate cluster resource management from job
coordination
Use slaves (many of them!) to manage jobs life-cycle
Scale to (at least) 10K nodes, 10K jobs, 100K tasks
4/27/13
Disappearance Of The Centralized JobTracker
4/27/13
YARN – Yet Another Resource Negotiator
4/27/13
Resource Manager and Application Master
4/27/13
YARN Applications
MRv2 (simply MRv1 rewritten to run on top of YARN)
No need to rewrite existing MapReduce jobs
Distributed shell
Spark, Apache S4 (a real-time processing)
Hamster (MPI on Hadoop)
Apache Giraph (a graph processing)
Apache HAMA (matrix, graph and network algorithms)
... and http://wiki.apache.org/hadoop/PoweredByYarn
4/27/13
NodeManager
More flexible and efficient than TaskTracker
Executes any computation that makes sense to ApplicationMaster
Not only map or reduce tasks
Containers with variable resource sizes (e.g. RAM, CPU, network, disk)
No hard-coded split into map and reduce slots
4/27/13
NodeManager Containers
NodeManager creates a container for each task
Container contains variable resources sizes
e.g. 2GB RAM, 1 CPU, 1 disk
Number of created containers is limited
By total sum of resources available on NodeManager
4/27/13
Application Submission In YARN
4/27/13
Resource Manager Components
4/27/13
UberizationUberization
Running the small jobs in the same JVM as AplicationMasterRunning the small jobs in the same JVM as AplicationMaster
mapreduce.job.ubertask.enablemapreduce.job.ubertask.enable true (false is default)true (false is default)
mapreduce.job.ubertask.maxmapsmapreduce.job.ubertask.maxmaps 9*9*
mapreduce.job.ubertask.maxreducesmapreduce.job.ubertask.maxreduces 1*1*
mapreduce.job.ubertask.maxbytesmapreduce.job.ubertask.maxbytes dfs.block.size*dfs.block.size*
* Users may override these values, but only downward* Users may override these values, but only downward
4/27/13
Interesting DifferencesInteresting Differences
Task progress, status, counters are passed directly to theTask progress, status, counters are passed directly to the
Application MasterApplication Master
Shuffle handlers (long-running auxiliary services in NodeShuffle handlers (long-running auxiliary services in Node
Managers) serve map outputs to reduce tasksManagers) serve map outputs to reduce tasks
YARN does not support JVM reuse for map/reduce tasksYARN does not support JVM reuse for map/reduce tasks
4/27/13
Fault-Tollerance
Failure of the running tasks or NodeManagers is similar as in MRv1
Applications can be retried several times
yarn.resourcemanager.am.max-retries 1
yarn.app.mapreduce.am.job.recovery.enable false
Resource Manager can start Application Master in a new container
yarn.app.mapreduce.am.job.recovery.enable false
4/27/13
Resource Manager Failure
Two interesting features
RM restarts without the need to re-run currently
running/submitted applications [YARN-128]
ZK-based High Availability (HA) for RM [YARN-149] (still in
progress)
4/27/13
Wire-Compatible ProtocolWire-Compatible Protocol
Client and cluster may use different versions
More manageable upgrades
Rolling upgrades without disrupting the service
Active and standby NameNodes upgraded independently
Protocol buffers chosen for serialization (instead of Writables)
4/27/13
YARN MaturityYARN Maturity
Still considered as both production and not production-readyStill considered as both production and not production-ready
The code is already promoted to the trunkThe code is already promoted to the trunk
Yahoo! runs it on 2000 and 6000-node clustersYahoo! runs it on 2000 and 6000-node clusters
Using Apache Hadoop 2.0.2 (Alpha)Using Apache Hadoop 2.0.2 (Alpha)
There is no production support for YARN yetThere is no production support for YARN yet
Anyway, it will replace MRv1 sooner than laterAnyway, it will replace MRv1 sooner than later
4/27/13
How To Quickly Setup The YARN Cluster
4/27/13
Basic YARN Configuration Parameters
yarn.nodemanager.resource.memory-mb 8192
yarn.nodemanager.resource.cpu-cores 8
yarn.scheduler.minimum-allocation-mb 1024
yarn.scheduler.maximum-allocation-mb 8192
yarn.scheduler.minimum-allocation-vcores 1
yarn.scheduler.maximum-allocation-vcores 32
4/27/13
Basic MR Apps Configuration Parameters
mapreduce.map.cpu.vcores 1
mapreduce.reduce.cpu.vcores 1
mapreduce.map.memory.mb 1536
mapreduce.map.java.opts -Xmx1024M
mapreduce.reduce.memory.mb 3072
mapreduce.reduce.java.opts -Xmx2560M
mapreduce.task.io.sort.mb 512
4/27/13
Apache Whirr Configuration File
4/27/13
Running And Destroying Cluster
$ whirr launch-cluster --config hadoop-yarn.properties
$ whirr destroy-cluster --config hadoop-yarn.properties
4/27/13
Running And Destroying Cluster
$ whirr$ whirr launch-clusterlaunch-cluster --config hadoop-yarn.properties--config hadoop-yarn.properties
$ whirr$ whirr destroy-clusterdestroy-cluster --config hadoop-yarn.properties--config hadoop-yarn.properties
4/27/13
It Might Be A Demo
But what about runningBut what about running
production applicationsproduction applications
on the real clusteron the real cluster
consisting of hundreds of nodes?consisting of hundreds of nodes?
Join Spotify!Join Spotify!
jobs@spotify.comjobs@spotify.com
4/27/13
Thank You!

More Related Content

What's hot

HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.finalHortonworks
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemAnshul Bhatnagar
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability Omid Vahdaty
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 

What's hot (20)

HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.final
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 

Viewers also liked

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High AvailabilityDataWorks Summit
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java APIAdam Kawa
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High AvailabilityHortonworks
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Adam Kawa
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGAdam Kawa
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem GetInData
 
Сергей Еланцев - Troubleshooting
Сергей Еланцев - Troubleshooting   Сергей Еланцев - Troubleshooting
Сергей Еланцев - Troubleshooting Yandex
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemCloudera, Inc.
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacjiAdam Kawa
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 

Viewers also liked (20)

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
HDFS Federation
HDFS FederationHDFS Federation
HDFS Federation
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High Availability
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUG
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
 
Сергей Еланцев - Troubleshooting
Сергей Еланцев - Troubleshooting   Сергей Еланцев - Troubleshooting
Сергей Еланцев - Troubleshooting
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacji
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 

Similar to Introduction to YARN, HDFS HA and Federation

Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseSankar H
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard  - Stuart PookRedundancy for Big Hadoop Clusters is hard  - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart PookEvention
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
New Oracle Infrastructure2
New Oracle Infrastructure2New Oracle Infrastructure2
New Oracle Infrastructure2markleeuw
 
Hadoop ha system admin
Hadoop ha system adminHadoop ha system admin
Hadoop ha system adminTrieu Dao Minh
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nageSantosh Nage
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualizationFranck Pachot
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebookyaevents
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop ClusterEdureka!
 
[OSDC 2013] Hadoop Cluster HA 的經驗分享
[OSDC 2013] Hadoop Cluster HA 的經驗分享[OSDC 2013] Hadoop Cluster HA 的經驗分享
[OSDC 2013] Hadoop Cluster HA 的經驗分享Tsu-Fen Han
 

Similar to Introduction to YARN, HDFS HA and Federation (20)

Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Unit 1
Unit 1Unit 1
Unit 1
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard  - Stuart PookRedundancy for Big Hadoop Clusters is hard  - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
New Oracle Infrastructure2
New Oracle Infrastructure2New Oracle Infrastructure2
New Oracle Infrastructure2
 
Hadoop ha system admin
Hadoop ha system adminHadoop ha system admin
Hadoop ha system admin
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
 
Zfs intro v2
Zfs intro v2Zfs intro v2
Zfs intro v2
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
[OSDC 2013] Hadoop Cluster HA 的經驗分享
[OSDC 2013] Hadoop Cluster HA 的經驗分享[OSDC 2013] Hadoop Cluster HA 的經驗分享
[OSDC 2013] Hadoop Cluster HA 的經驗分享
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Introduction to YARN, HDFS HA and Federation

  • 1. 4/27/13 Introduction To YARN, NameNode HA and HDFS Federation Adam Kawa, Spotify
  • 2. 4/27/13 About Me Data Engineer at Spotify, Sweden Hadoop Instructor at Compendium (Cloudera Training Partner) +2.5 year of experience in Hadoop
  • 3. 4/27/13 “Classical” Apache Hadoop Cluster Can consist of one, five, hundred or 4000 nodes
  • 4. 4/27/13 Hadoop Golden EraHadoop Golden Era One of the most exciting open-source projects on the planetOne of the most exciting open-source projects on the planet Becomes the standard for large-scale processingBecomes the standard for large-scale processing Successfully deployed by hundreds/thousands of companiesSuccessfully deployed by hundreds/thousands of companies Like a salutary virusLike a salutary virus ... But it has some little drawbacks... But it has some little drawbacks
  • 5. 4/27/13 HDFS Main LimitationsHDFS Main Limitations Single NameNode thatSingle NameNode that keeps all metadata in RAMkeeps all metadata in RAM performs all metadata operationsperforms all metadata operations becomes a single point of failure (SPOF)becomes a single point of failure (SPOF)
  • 6. 4/27/13 HDFS Main ImprovementHDFS Main Improvement Introduce multiple NameNodes HDFS Federation HDFS High Availability (HA)
  • 7. 4/27/13 Motivation for HDFS Federation Limited namespace scalability Decreased metadata operations performance Lack of isolation
  • 8. 4/27/13 HDFS – master and slaves Master (NameNode) manages all metadata information Slaves (DataNodes) store blocks of data and serve them to the client
  • 9. 4/27/13 HDFS NameNode (NN) Supports all the namespace-related file system operations Keeps information in RAM for fast lookup The filesystem tree Metadata for all files/directories (e.g. ownerships, permissions) Names and locations of blocks Metadata (not all) is additionally stored on disk(s) Filesystem snapshot (fsimage) + editlog (edits) files
  • 10. 4/27/13 HDFS DataNode (DN) Stores and retrieves blocks A block is stored as a regular files on local disk e.g. blk_-992391354910561645 A block itself does not know which file it belongs to Sends the block report to NN periodically Sends the heartbeat signals to NN to say that it is alive
  • 11. 4/27/13 NameNode scalability issues Number of files limited by the amount of RAM of one machine More RAM means also heavier GC Stats based on Y! Clusters An average file ≈ 1.5 blocks (block size = 128 MB) An average file ≈ 600 bytes of metadata (1 file and 2 blocks objects) 100M files ≈ 60 GB of metadata 1 GB of metadata ≈ 1 PB physical storage (but usually less)
  • 12. 4/27/13 NameNode performance Read/write operations throughput limited by one machine ~120K read ops/sec, ~6K write ops/sec MapReduce tasks are also HDFS clients Internal load increases as the cluster grows Block reports and heartbeats from DataNodes Bigger snapshots transferred from/to Secondary NameNode NNBench - benchmark that stresses the NameNode
  • 13. 4/27/13 Lack of isolation Experimental applications (can) overload the NN and slow down production jobs Production applications with different requirements (can) benefit from “own” NameNode e.g. HBase requires extremely fast responses
  • 14. 4/27/13 HDFS Federation Partition the filesystem namespace over multiple separated NameNodes
  • 15. 4/27/13 Multiple independent namespaces NameNodes does not talk each other Datanode can store blocks managed by any NameNode NameNode manages only slice of namespace
  • 16. 4/27/13 The client-view of the cluster There are multiple namespaces (and corresponding NNs) Client can use any of them to compose its own view of HDFS Idea is much like the Linux /etc/fstab file Mapping between paths and NameNodes
  • 19. 4/27/13 Deploying HDFS Federation Deploying HDFS FederationDeploying HDFS Federation Useful for small (+isolation) and large (+scalability) clusterUseful for small (+isolation) and large (+scalability) cluster However not so many clusters use itHowever not so many clusters use it NameNodes can be added/removed at any timeNameNodes can be added/removed at any time Cluster does have to be restartedCluster does have to be restarted Add Federation to the existing clusterAdd Federation to the existing cluster Do not format existing NNDo not format existing NN Newly added NN will be “empty”Newly added NN will be “empty”
  • 20. 4/27/13 Motivation for HDFS High Availability Single NameNode is a single point of failure (SPOF)Single NameNode is a single point of failure (SPOF) Secondary NN and HDFS Federation do not change thatSecondary NN and HDFS Federation do not change that Recovery from failed NN may take even tens of minutesRecovery from failed NN may take even tens of minutes Two types of downtimesTwo types of downtimes Unexpected failure (infrequent)Unexpected failure (infrequent) Planned maintenance (common)Planned maintenance (common)
  • 21. 4/27/13 HDFS High Availability Introduces a pair of redundant NameNodes One Active and one Standby If the Active NameNode crashes or is intentionally stopped, Then the Standby takes over quickly
  • 22. 4/27/13 NameNodes resposibilitiesNameNodes resposibilities The Active NameNode is responsible for all client operationsThe Active NameNode is responsible for all client operations The Standby NameNode is watchfulThe Standby NameNode is watchful Maintains enough state to provide a fast failoverMaintains enough state to provide a fast failover Does also checkpointing (disappearance of Secondary NN)Does also checkpointing (disappearance of Secondary NN)
  • 23. 4/27/13 Synchronizing the state of metadata The Standby must keep its state as up-to-date as possible Edit logs - two alternative ways of sharing them Shared storage using NFS Quorum-based storage (recommended) Block locations DataNodes send block reports to both NameNodes
  • 25. 4/27/13 Supported failover modes Manual fail-over Initiated by administrator by a dedicated command - 0-3 seconds Automatic fail-over Detected by a fault in the active NN (more complicated and longer) - 30-40 seconds
  • 26. 4/27/13 Manual failover Initiate a failover from the first NN to the second one hdfs haadmin ­failover <to­be­standby­NN> <to­be­active­NN>
  • 27. 4/27/13 Automatic failover Requires ZooKeeper and an additional process “ZKFailoverController”
  • 28. 4/27/13 The split brain scenario A potential scenario when two NameNodes Both think they are active Both make conflicting changes to the namespace Example:The ZKFC crashes, but its NN is still running
  • 29. 4/27/13 FencingFencing Ensure that the previous Active NameNode is no longer able toEnsure that the previous Active NameNode is no longer able to make any changes to the system metadatamake any changes to the system metadata
  • 30. 4/27/13 Fencing with Quorum-based storage Only a single NN can be a writer to JournalNodes at a time Whenever NN becomes active, it generated an “epoch number” NN that has a higher “epoch number” can be a writer This prevents from corrupting the file system metadata
  • 31. 4/27/13 “Read” requests fencing Previous active NN will be fenced when tries to write to the JN Until it happens, it may still serve HDFS read requests Try to fence this NN before that happens This prevents from reading outdated metadata Might be done automatically by ZKFC or hdfs haadmin
  • 32. 4/27/13 HDFS Highly-Available Federated Cluster Any combination is possible Federation without HA HA without Federation HA with Federation e.g. NameNode HA for HBase, but not for the others
  • 33. 4/27/13 “Classical” MapReduce Limitations Limited scalability Poor resource utilization Lack of support for the alternative frameworks Lack of wire-compatible protocols
  • 34. 4/27/13 Limited ScalabilityLimited Scalability Scales to onlyScales to only ~4,000-node cluster~4,000-node cluster ~40,000 concurrent tasks~40,000 concurrent tasks Smaller and less-powerful clusters must be createdSmaller and less-powerful clusters must be created Image source: http://www.flickr.com/photos/rsms/660332848/sizes/l/in/set-72157607427138760/Image source: http://www.flickr.com/photos/rsms/660332848/sizes/l/in/set-72157607427138760/
  • 35. 4/27/13 Poor Cluster Resources Utilization Hard-coded values. TaskTrackers must be restarted after a change.
  • 36. 4/27/13 Poor Cluster Resources Utilization Map tasks are waiting for the slots which are NOT currently used by Reduce tasks Hard-coded values. TaskTrackers must be restarted after a change.
  • 37. 4/27/13 Resources Needs That Varies Over Time How to hard-code the number of map and reduce slots efficiently?
  • 38. 4/27/13 (Secondary) Benefits Of Better Utilization(Secondary) Benefits Of Better Utilization The same calculation on more efficient Hadoop clusterThe same calculation on more efficient Hadoop cluster Less data center spaceLess data center space Less silicon wasteLess silicon waste Less power utilizationsLess power utilizations Less carbon emissionsLess carbon emissions == Less disastrous environmental consequencesLess disastrous environmental consequences
  • 40. 4/27/13 JobTracker ResponsibilitiesJobTracker Responsibilities Manages the computational resources (map and reduce slots)Manages the computational resources (map and reduce slots) Schedules all user jobsSchedules all user jobs Schedules all tasks that belongs to a jobSchedules all tasks that belongs to a job Monitors tasks executionsMonitors tasks executions Restarts failed and speculatively runs slow tasksRestarts failed and speculatively runs slow tasks Calculates job counters totalsCalculates job counters totals
  • 42. 4/27/13 JobTracker Redesign Idea Reduce responsibilities of JobTracker Separate cluster resource management from job coordination Use slaves (many of them!) to manage jobs life-cycle Scale to (at least) 10K nodes, 10K jobs, 100K tasks
  • 43. 4/27/13 Disappearance Of The Centralized JobTracker
  • 44. 4/27/13 YARN – Yet Another Resource Negotiator
  • 45. 4/27/13 Resource Manager and Application Master
  • 46. 4/27/13 YARN Applications MRv2 (simply MRv1 rewritten to run on top of YARN) No need to rewrite existing MapReduce jobs Distributed shell Spark, Apache S4 (a real-time processing) Hamster (MPI on Hadoop) Apache Giraph (a graph processing) Apache HAMA (matrix, graph and network algorithms) ... and http://wiki.apache.org/hadoop/PoweredByYarn
  • 47. 4/27/13 NodeManager More flexible and efficient than TaskTracker Executes any computation that makes sense to ApplicationMaster Not only map or reduce tasks Containers with variable resource sizes (e.g. RAM, CPU, network, disk) No hard-coded split into map and reduce slots
  • 48. 4/27/13 NodeManager Containers NodeManager creates a container for each task Container contains variable resources sizes e.g. 2GB RAM, 1 CPU, 1 disk Number of created containers is limited By total sum of resources available on NodeManager
  • 51. 4/27/13 UberizationUberization Running the small jobs in the same JVM as AplicationMasterRunning the small jobs in the same JVM as AplicationMaster mapreduce.job.ubertask.enablemapreduce.job.ubertask.enable true (false is default)true (false is default) mapreduce.job.ubertask.maxmapsmapreduce.job.ubertask.maxmaps 9*9* mapreduce.job.ubertask.maxreducesmapreduce.job.ubertask.maxreduces 1*1* mapreduce.job.ubertask.maxbytesmapreduce.job.ubertask.maxbytes dfs.block.size*dfs.block.size* * Users may override these values, but only downward* Users may override these values, but only downward
  • 52. 4/27/13 Interesting DifferencesInteresting Differences Task progress, status, counters are passed directly to theTask progress, status, counters are passed directly to the Application MasterApplication Master Shuffle handlers (long-running auxiliary services in NodeShuffle handlers (long-running auxiliary services in Node Managers) serve map outputs to reduce tasksManagers) serve map outputs to reduce tasks YARN does not support JVM reuse for map/reduce tasksYARN does not support JVM reuse for map/reduce tasks
  • 53. 4/27/13 Fault-Tollerance Failure of the running tasks or NodeManagers is similar as in MRv1 Applications can be retried several times yarn.resourcemanager.am.max-retries 1 yarn.app.mapreduce.am.job.recovery.enable false Resource Manager can start Application Master in a new container yarn.app.mapreduce.am.job.recovery.enable false
  • 54. 4/27/13 Resource Manager Failure Two interesting features RM restarts without the need to re-run currently running/submitted applications [YARN-128] ZK-based High Availability (HA) for RM [YARN-149] (still in progress)
  • 55. 4/27/13 Wire-Compatible ProtocolWire-Compatible Protocol Client and cluster may use different versions More manageable upgrades Rolling upgrades without disrupting the service Active and standby NameNodes upgraded independently Protocol buffers chosen for serialization (instead of Writables)
  • 56. 4/27/13 YARN MaturityYARN Maturity Still considered as both production and not production-readyStill considered as both production and not production-ready The code is already promoted to the trunkThe code is already promoted to the trunk Yahoo! runs it on 2000 and 6000-node clustersYahoo! runs it on 2000 and 6000-node clusters Using Apache Hadoop 2.0.2 (Alpha)Using Apache Hadoop 2.0.2 (Alpha) There is no production support for YARN yetThere is no production support for YARN yet Anyway, it will replace MRv1 sooner than laterAnyway, it will replace MRv1 sooner than later
  • 57. 4/27/13 How To Quickly Setup The YARN Cluster
  • 58. 4/27/13 Basic YARN Configuration Parameters yarn.nodemanager.resource.memory-mb 8192 yarn.nodemanager.resource.cpu-cores 8 yarn.scheduler.minimum-allocation-mb 1024 yarn.scheduler.maximum-allocation-mb 8192 yarn.scheduler.minimum-allocation-vcores 1 yarn.scheduler.maximum-allocation-vcores 32
  • 59. 4/27/13 Basic MR Apps Configuration Parameters mapreduce.map.cpu.vcores 1 mapreduce.reduce.cpu.vcores 1 mapreduce.map.memory.mb 1536 mapreduce.map.java.opts -Xmx1024M mapreduce.reduce.memory.mb 3072 mapreduce.reduce.java.opts -Xmx2560M mapreduce.task.io.sort.mb 512
  • 61. 4/27/13 Running And Destroying Cluster $ whirr launch-cluster --config hadoop-yarn.properties $ whirr destroy-cluster --config hadoop-yarn.properties
  • 62. 4/27/13 Running And Destroying Cluster $ whirr$ whirr launch-clusterlaunch-cluster --config hadoop-yarn.properties--config hadoop-yarn.properties $ whirr$ whirr destroy-clusterdestroy-cluster --config hadoop-yarn.properties--config hadoop-yarn.properties
  • 63. 4/27/13 It Might Be A Demo But what about runningBut what about running production applicationsproduction applications on the real clusteron the real cluster consisting of hundreds of nodes?consisting of hundreds of nodes? Join Spotify!Join Spotify! jobs@spotify.comjobs@spotify.com