SlideShare a Scribd company logo
1 of 37
Introduction to Hadoop Administration
View Hadoop Administration course details at www.edureka.co/hadoop-admin
LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work on Large Data Base
Verifiable Certificate
www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
How it Works?
www.edureka.co/hadoop-adminSlide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Objectives of this Session
At the end of this module, you will be able to
 Understand how Hadoop overruled the limitations of traditional technologies
 Understand key responsibilities of Hadoop Administrator
 Understand Hadoop Federation and High Availability
 Understand Hadoop Cluster Modes
 Set up a Hadoop Cluster
 Commission and decommission a DataNode
www.edureka.co/hadoop-adminSlide 4 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Lots of Data (Terabytes or Petabytes)
Big data is the term for a collection of data sets so large
and complex that it becomes difficult to process using
on-hand database management tools or traditional data
processing applications
The challenges include capture, curation, storage,
search, sharing, transfer, analysis, and visualization
 Systems/Enterprises generate huge amount of data
from Terabytes and even Petabytes of information
What is Big Data?
Stock market generates about one terabyte of
new trade data per day to perform stock trading
analytics to determine trends for optimal trades.
www.edureka.co/hadoop-adminSlide 5 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
IBM’s Definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
IBM’s Definition
Web
logs
Images
Videos
Audios
Sensor
Data
VOLUME VELOCITY VARIETY
www.edureka.co/hadoop-adminSlide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Limitations of Existing Data Analytics Architecture
http://www.informationweek.com/it-leadership/why-sears-is-going-all-in-on-hadoop/d/d-id/1107038?
90% of
the ~2PB
archived
Storage
Processing
Instrumentation
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
ETL Compute Grid
3. Premature data
death
1. Can’t explore original
high fidelity raw data
2. Moving data to compute
doesn’t scale
Mostly Append
A meagre
10% of the
~2PB Data is
available for
BI
Storage only Grid (Original Raw Data)
Collection
www.edureka.co/hadoop-adminSlide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Solution: A Combined Storage Computer Layer
*Sears moved to a 300-Node Hadoop cluster to keep 100% of its data available for processing rather than a meagre 10% as was the case with existing Non-Hadoop solutions.
No Data
Archiving
1. Data exploration &
advanced analytics
2. Scalable throughput for ETL &
aggregation
3. Keep data alive
forever
Mostly Append
Instrumentation
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
Collection
Hadoop : Storage + Compute Grid
Entire ~2PB
data is
available for
processing
Both
storage
and
processing
www.edureka.co/hadoop-adminSlide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Why Hadoop?
How to solve the challenges
posed by Big Data?
www.edureka.co/hadoop-adminSlide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Why Hadoop?
The Hadoop platform is
designed to solve problems
posed by Big Data.
Size of Data Variety of Data
www.edureka.co/hadoop-adminSlide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
What is Hadoop?
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of
commodity computers using a simple programming model.
It is an Open-source Data Management with scale-out storage & distributed processing.
www.edureka.co/hadoop-adminSlide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop Key Characteristics
Reliable
EconomicalFlexible
Scalable
Hadoop
Features
www.edureka.co/hadoop-adminSlide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Some of the Hadoop Users
www.edureka.co/hadoop-adminSlide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Job Market
www.edureka.co/hadoop-adminSlide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Skills Required
General operational expertise such as good troubleshooting skills,
understanding of Capacity Planning.
Hadoop skills like HBase, Hive, Pig, Mahout, etc.
They should be able to deploy Hadoop cluster, monitor and scale critical
parts of the cluster.
Good knowledge of Linux as Hadoop runs on Linux.
Familiarity with open source configuration management and deployment
tools such as Puppet or Chef and Linux scripting.
Knowledge of Troubleshooting Core Java Applications is a plus.
www.edureka.co/hadoop-adminSlide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop Admin Responsibilities
Responsible for implementation and administration of Hadoop infrastructure.
Testing HDFS, Hive, Pig and MapReduce access for Applications.
Cluster maintenance tasks like Backup, Recovery, Upgrade, Patching.
Performance tuning and Capacity planning for Clusters.
Monitor Hadoop cluster and deploy security.
www.edureka.co/hadoop-adminSlide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop 1.x and Hadoop 2.x Ecosystem
Pig Latin
Data Analysis
Hive
DW System
Other
YARN
Frameworks
(MPI, GIRAPH)
HBaseMapReduce Framework
YARN
Cluster Resource Management
Apache Oozie
(Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework
Apache Oozie
(Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
HBase
Structured DataUnstructured/
Semi-structured Data
Hadoop 1.x Hadoop 2.x
www.edureka.co/hadoop-adminSlide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop 1.x Core Components
Hadoop is a system for large scale data processing.
2 Main Hadoop 1.x Core Components
Storage Processing
HDFS MapReduce
 Distributed across “nodes”
 Natively redundant
 NameNode tracks locations
 Splits a task across processors
 “near” the data & assembles results
 Self-healing, high bandwidth
 Clustered storage
 JobTracker manages the TaskTrackers
Additional Administration Tools:
» Filesystem utilities
» Job scheduling and
monitoring
» Web UI
www.edureka.co/hadoop-adminSlide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop 2.x Core Components
Hadoop is a system for large scale data processing.
2 Main Hadoop 2.x Core Components
Storage Processing
HDFS
MapReduce
NextGen / YARN / MRv2
 Highly available
 Distributed across “nodes”
 NameNode tracks locations
 Splits a task across processors
 “near” the data & assembles results
 Resource management and job
scheduling/monitoring
 Clustered storage
 Individual application can utilize cluster
resources in a shared, secure and multi-
tenant manner
 Maintains API compatibility with previous
stable releases of Hadoop
www.edureka.co/hadoop-adminSlide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Main Components of HDFS
NameNode:
» Master of the system
» Maintains and manages the blocks which are present on
the DataNodes
DataNodes:
» Slaves which are deployed on each machine and provide
the actual storage
» Responsible for serving read and write requests for the
clients
www.edureka.co/hadoop-adminSlide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
NameNode Metadata
Meta-data in Memory
» The entire metadata is in main memory
» No demand paging of FS meta-data
Types of Metadata
» List of Files
» List of Blocks for each file
» List of DataNode for each block
» File attributes, e.g. access time, replication factor
A Transaction Log
» Records file creations, file deletions. etc
NameNode
(Stores metadata only)
METADATA:
/user/doug/hinfo -> 1 3 5
/user/doug/pdetail -> 4 2
NameNode:
Keeps track of overall file directory
structure and the placement of Data Block
www.edureka.co/hadoop-adminSlide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Secondary NameNode
 Secondary NameNode:
 Not a hot standby for the NameNode
 Connects to NameNode every hour*
 Housekeeping, backup of NameNode metadata
 Saved metadata can build a failed NameNode
You give me
metadata every
hour, I will make
it secure
Secondary
NameNode
NameNode
metadata
metadata
Single Point
Failure
Only in case of
hadoop 1.x , not in
hadoop 2.x
www.edureka.co/hadoop-adminSlide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop 2.x – In Summary
NameNode High
Availability
Next Generation
MapReduce
Client
HDFS YARN
Resource ManagerSecondary
Name Node
Standby
NameNode
Active
NameNode
Distributed Data Storage Distributed Data Processing
DataNode
Node Manager
Container
App
Master
…….
Masters
Slaves
Node Manager
DataNode
Container
App
Master
DataNode
Node Manager
Container
App
Master
Shared
edit logs
OR
Journal
Node
Scheduler
Applications
Manager
(AsM)
www.edureka.co/hadoop-adminSlide 23 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop 2.x Cluster Architecture - Federation
Namenode
Block Management
NS
Storage
…
NamespaceBlockStorage
Namespace
NN-1 NN-k NN-n
Common Storage
BlockStorage
… …
Hadoop 1.0 Hadoop 2.0
http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/Federation.html
NS1 NSk NSn
DatanodeDatanode
Datanode 1
…
Datanode m
…
Datanode 2
…
Pool 1 Pool k Pool n
Block Pools
www.edureka.co/hadoop-adminSlide 24 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Node Manager
Container
App
Master
Node Manager
Container
App
Master
Hadoop 2.x – High Availability
HDFS YARN
Resource
Manager
All name space edits
logged to shared NFS
storage; single writer
(fencing)
Read edit logs and
applies to its own
namespace
Secondary
Name Node
DataNode
Standby
NameNode
Active
NameNode
DataNode Data Node
DataNodeDataNode
NameNode
High
Availability
Next Generation
MapReduce
*Not necessary to
configure Secondary
NameNode
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
Client
Shared Edit Logs
HDFS HIGH AVAILABILITY
Node Manager
Container
App
Master
Node Manager
Container
App
Master
www.edureka.co/hadoop-adminSlide 25 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop 2.x – Resource Management
Node Manager
Container
App
Master
Node Manager
Container
App
Master
HDFS YARN
Resource
Manager
All name space edits
logged to shared NFS
storage; single writer
(fencing)
Read edit logs and
applies to its own
namespace
Secondary
Name Node
DataNode
Standby
NameNode
Active
NameNode
DataNode Data Node
DataNodeDataNode
NameNode
High
Availability
Next Generation
MapReduce
*Not necessary to
configure Secondary
NameNode
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
Client
Shared Edit Logs
HDFS HIGH AVAILABILITY
Node Manager
Container
App
Master
Node Manager
Container
App
Master
www.edureka.co/hadoop-adminSlide 26 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop Cluster: A Typical Use Case
NameNode Secondary NameNode
DataNode
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 32 GB,
Hard disk: 1 TB
Processor: Xenon with 4 Cores
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 16GB
Hard disk: 6 X 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
DataNode
RAM: 16GB
Hard disk: 6 X 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
www.edureka.co/hadoop-adminSlide 27 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop 1.x Configuration Files – Apache Hadoop
Core
HDFS
core-site.xml
hdfs-site.xml
mapred-site.xml
Map
Reduce
www.edureka.co/hadoop-adminSlide 28 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop 2.x Configuration Files – Apache Hadoop
Core
HDFS
core-site.xml
hdfs-site.xml
yarn-site.xmlYARN
mapred-site.xml
Map
Reduce
www.edureka.co/hadoop-adminSlide 29 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Replication and Rack Awareness
www.edureka.co/hadoop-adminSlide 30 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Hadoop Cluster Modes
Hadoop can run in any of the following three modes:
Fully-Distributed Mode
Pseudo-Distributed Mode
 No daemons, everything runs in a single JVM.
 Suitable for running MapReduce programs during development.
 Has no DFS.
 Hadoop daemons run on the local machine.
 Hadoop daemons run on a cluster of machines.
Standalone (or Local) Mode
www.edureka.co/hadoop-adminSlide 31 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
DEMO
Hadoop Cluster Setup
www.edureka.co/hadoop-adminSlide 32 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
DEMO
Hadoop Rack Awareness
www.edureka.co/hadoop-adminSlide 33 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
DEMO
Secondary NameNode
www.edureka.co/hadoop-adminSlide 34 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Commissioning and Decommissioning of DataNode
DataNode
Master Node
DataNode
DataNode DataNode DataNode
DataNodeDataNode
DataNode
DecommissioningCommissioning
Questions
www.edureka.co/hadoop-adminSlide 35 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.co/hadoop-adminSlide 36 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Course Topics
 Module 1
» Understanding Big Data
» Hadoop Components
 Module 2
» Different Hadoop Server Roles
» Hadoop Cluster Configuration
 Module 3
» Hadoop Cluster Planning
» Job Scheduling
 Module 4
» Securing your Hadoop Cluster
» Backup and Recovery
 Module 5
» Hadoop 2.0 New Features
» HDFS High Availability
 Module 6
» Quorum Journal Manager (QJM)
» Hadoop 2.0 - YARN
 Module 7
» Oozie Workflow Scheduler
» Hive and Hbase Administration
 Module 8
» Hadoop Cluster Case Study
» Hadoop Implementation
Introduction to Hadoop Administration

More Related Content

What's hot

Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Edureka!
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopEdureka!
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce FrameworkEdureka!
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduceEdureka!
 
Hadoop for Java Professionals
Hadoop for Java ProfessionalsHadoop for Java Professionals
Hadoop for Java ProfessionalsEdureka!
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopLeons Petražickis
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudEdureka!
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceEdureka!
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 

What's hot (20)

Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
 
Hadoop for Java Professionals
Hadoop for Java ProfessionalsHadoop for Java Professionals
Hadoop for Java Professionals
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Next generation technology
Next generation technologyNext generation technology
Next generation technology
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 

Viewers also liked

Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionEdureka!
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and developmentconline training
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoopColin Su
 
Web 2.0 y plataformas educativas: Un matrimonio de conveniencia
Web 2.0 y plataformas educativas: Un matrimonio de convenienciaWeb 2.0 y plataformas educativas: Un matrimonio de conveniencia
Web 2.0 y plataformas educativas: Un matrimonio de convenienciaÁngel Encinas
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop WorldCloudera, Inc.
 

Viewers also liked (20)

Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
rocas
rocasrocas
rocas
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and development
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Web 2.0 y plataformas educativas: Un matrimonio de conveniencia
Web 2.0 y plataformas educativas: Un matrimonio de convenienciaWeb 2.0 y plataformas educativas: Un matrimonio de conveniencia
Web 2.0 y plataformas educativas: Un matrimonio de conveniencia
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Comparison groups
Comparison groupsComparison groups
Comparison groups
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
Comparison pairs
Comparison pairsComparison pairs
Comparison pairs
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop World
 

Similar to Introduction to Hadoop Administration

Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop Edureka!
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Hadoop : The Pile of Big Data
Hadoop : The Pile of Big DataHadoop : The Pile of Big Data
Hadoop : The Pile of Big DataEdureka!
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product pageJanu Jahnavi
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaEdureka!
 
Construindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataConstruindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataMarco Garcia
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook AhmedDoukh
 

Similar to Introduction to Hadoop Administration (20)

Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Hadoop : The Pile of Big Data
Hadoop : The Pile of Big DataHadoop : The Pile of Big Data
Hadoop : The Pile of Big Data
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product page
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
 
Construindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataConstruindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigData
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 

More from Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

Introduction to Hadoop Administration

  • 1. Introduction to Hadoop Administration View Hadoop Administration course details at www.edureka.co/hadoop-admin
  • 2. LIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work on Large Data Base Verifiable Certificate www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions How it Works?
  • 3. www.edureka.co/hadoop-adminSlide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Objectives of this Session At the end of this module, you will be able to  Understand how Hadoop overruled the limitations of traditional technologies  Understand key responsibilities of Hadoop Administrator  Understand Hadoop Federation and High Availability  Understand Hadoop Cluster Modes  Set up a Hadoop Cluster  Commission and decommission a DataNode
  • 4. www.edureka.co/hadoop-adminSlide 4 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Lots of Data (Terabytes or Petabytes) Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization  Systems/Enterprises generate huge amount of data from Terabytes and even Petabytes of information What is Big Data? Stock market generates about one terabyte of new trade data per day to perform stock trading analytics to determine trends for optimal trades.
  • 5. www.edureka.co/hadoop-adminSlide 5 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions IBM’s Definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ IBM’s Definition Web logs Images Videos Audios Sensor Data VOLUME VELOCITY VARIETY
  • 6. www.edureka.co/hadoop-adminSlide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Limitations of Existing Data Analytics Architecture http://www.informationweek.com/it-leadership/why-sears-is-going-all-in-on-hadoop/d/d-id/1107038? 90% of the ~2PB archived Storage Processing Instrumentation BI Reports + Interactive Apps RDBMS (Aggregated Data) ETL Compute Grid 3. Premature data death 1. Can’t explore original high fidelity raw data 2. Moving data to compute doesn’t scale Mostly Append A meagre 10% of the ~2PB Data is available for BI Storage only Grid (Original Raw Data) Collection
  • 7. www.edureka.co/hadoop-adminSlide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Solution: A Combined Storage Computer Layer *Sears moved to a 300-Node Hadoop cluster to keep 100% of its data available for processing rather than a meagre 10% as was the case with existing Non-Hadoop solutions. No Data Archiving 1. Data exploration & advanced analytics 2. Scalable throughput for ETL & aggregation 3. Keep data alive forever Mostly Append Instrumentation BI Reports + Interactive Apps RDBMS (Aggregated Data) Collection Hadoop : Storage + Compute Grid Entire ~2PB data is available for processing Both storage and processing
  • 8. www.edureka.co/hadoop-adminSlide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Why Hadoop? How to solve the challenges posed by Big Data?
  • 9. www.edureka.co/hadoop-adminSlide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Why Hadoop? The Hadoop platform is designed to solve problems posed by Big Data. Size of Data Variety of Data
  • 10. www.edureka.co/hadoop-adminSlide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions What is Hadoop? Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model. It is an Open-source Data Management with scale-out storage & distributed processing.
  • 11. www.edureka.co/hadoop-adminSlide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop Key Characteristics Reliable EconomicalFlexible Scalable Hadoop Features
  • 12. www.edureka.co/hadoop-adminSlide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Some of the Hadoop Users
  • 13. www.edureka.co/hadoop-adminSlide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Job Market
  • 14. www.edureka.co/hadoop-adminSlide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Skills Required General operational expertise such as good troubleshooting skills, understanding of Capacity Planning. Hadoop skills like HBase, Hive, Pig, Mahout, etc. They should be able to deploy Hadoop cluster, monitor and scale critical parts of the cluster. Good knowledge of Linux as Hadoop runs on Linux. Familiarity with open source configuration management and deployment tools such as Puppet or Chef and Linux scripting. Knowledge of Troubleshooting Core Java Applications is a plus.
  • 15. www.edureka.co/hadoop-adminSlide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop Admin Responsibilities Responsible for implementation and administration of Hadoop infrastructure. Testing HDFS, Hive, Pig and MapReduce access for Applications. Cluster maintenance tasks like Backup, Recovery, Upgrade, Patching. Performance tuning and Capacity planning for Clusters. Monitor Hadoop cluster and deploy security.
  • 16. www.edureka.co/hadoop-adminSlide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop 1.x and Hadoop 2.x Ecosystem Pig Latin Data Analysis Hive DW System Other YARN Frameworks (MPI, GIRAPH) HBaseMapReduce Framework YARN Cluster Resource Management Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis HBase Structured DataUnstructured/ Semi-structured Data Hadoop 1.x Hadoop 2.x
  • 17. www.edureka.co/hadoop-adminSlide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop 1.x Core Components Hadoop is a system for large scale data processing. 2 Main Hadoop 1.x Core Components Storage Processing HDFS MapReduce  Distributed across “nodes”  Natively redundant  NameNode tracks locations  Splits a task across processors  “near” the data & assembles results  Self-healing, high bandwidth  Clustered storage  JobTracker manages the TaskTrackers Additional Administration Tools: » Filesystem utilities » Job scheduling and monitoring » Web UI
  • 18. www.edureka.co/hadoop-adminSlide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop 2.x Core Components Hadoop is a system for large scale data processing. 2 Main Hadoop 2.x Core Components Storage Processing HDFS MapReduce NextGen / YARN / MRv2  Highly available  Distributed across “nodes”  NameNode tracks locations  Splits a task across processors  “near” the data & assembles results  Resource management and job scheduling/monitoring  Clustered storage  Individual application can utilize cluster resources in a shared, secure and multi- tenant manner  Maintains API compatibility with previous stable releases of Hadoop
  • 19. www.edureka.co/hadoop-adminSlide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Main Components of HDFS NameNode: » Master of the system » Maintains and manages the blocks which are present on the DataNodes DataNodes: » Slaves which are deployed on each machine and provide the actual storage » Responsible for serving read and write requests for the clients
  • 20. www.edureka.co/hadoop-adminSlide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions NameNode Metadata Meta-data in Memory » The entire metadata is in main memory » No demand paging of FS meta-data Types of Metadata » List of Files » List of Blocks for each file » List of DataNode for each block » File attributes, e.g. access time, replication factor A Transaction Log » Records file creations, file deletions. etc NameNode (Stores metadata only) METADATA: /user/doug/hinfo -> 1 3 5 /user/doug/pdetail -> 4 2 NameNode: Keeps track of overall file directory structure and the placement of Data Block
  • 21. www.edureka.co/hadoop-adminSlide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Secondary NameNode  Secondary NameNode:  Not a hot standby for the NameNode  Connects to NameNode every hour*  Housekeeping, backup of NameNode metadata  Saved metadata can build a failed NameNode You give me metadata every hour, I will make it secure Secondary NameNode NameNode metadata metadata Single Point Failure Only in case of hadoop 1.x , not in hadoop 2.x
  • 22. www.edureka.co/hadoop-adminSlide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop 2.x – In Summary NameNode High Availability Next Generation MapReduce Client HDFS YARN Resource ManagerSecondary Name Node Standby NameNode Active NameNode Distributed Data Storage Distributed Data Processing DataNode Node Manager Container App Master ……. Masters Slaves Node Manager DataNode Container App Master DataNode Node Manager Container App Master Shared edit logs OR Journal Node Scheduler Applications Manager (AsM)
  • 23. www.edureka.co/hadoop-adminSlide 23 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop 2.x Cluster Architecture - Federation Namenode Block Management NS Storage … NamespaceBlockStorage Namespace NN-1 NN-k NN-n Common Storage BlockStorage … … Hadoop 1.0 Hadoop 2.0 http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/Federation.html NS1 NSk NSn DatanodeDatanode Datanode 1 … Datanode m … Datanode 2 … Pool 1 Pool k Pool n Block Pools
  • 24. www.edureka.co/hadoop-adminSlide 24 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Node Manager Container App Master Node Manager Container App Master Hadoop 2.x – High Availability HDFS YARN Resource Manager All name space edits logged to shared NFS storage; single writer (fencing) Read edit logs and applies to its own namespace Secondary Name Node DataNode Standby NameNode Active NameNode DataNode Data Node DataNodeDataNode NameNode High Availability Next Generation MapReduce *Not necessary to configure Secondary NameNode http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html Client Shared Edit Logs HDFS HIGH AVAILABILITY Node Manager Container App Master Node Manager Container App Master
  • 25. www.edureka.co/hadoop-adminSlide 25 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop 2.x – Resource Management Node Manager Container App Master Node Manager Container App Master HDFS YARN Resource Manager All name space edits logged to shared NFS storage; single writer (fencing) Read edit logs and applies to its own namespace Secondary Name Node DataNode Standby NameNode Active NameNode DataNode Data Node DataNodeDataNode NameNode High Availability Next Generation MapReduce *Not necessary to configure Secondary NameNode http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html Client Shared Edit Logs HDFS HIGH AVAILABILITY Node Manager Container App Master Node Manager Container App Master
  • 26. www.edureka.co/hadoop-adminSlide 26 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop Cluster: A Typical Use Case NameNode Secondary NameNode DataNode RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 X 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply RAM: 32 GB, Hard disk: 1 TB Processor: Xenon with 4 Cores Ethernet: 3 X 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply RAM: 16GB Hard disk: 6 X 2TB Processor: Xenon with 2 cores. Ethernet: 3 X 10 GB/s OS: 64-bit CentOS DataNode RAM: 16GB Hard disk: 6 X 2TB Processor: Xenon with 2 cores. Ethernet: 3 X 10 GB/s OS: 64-bit CentOS
  • 27. www.edureka.co/hadoop-adminSlide 27 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop 1.x Configuration Files – Apache Hadoop Core HDFS core-site.xml hdfs-site.xml mapred-site.xml Map Reduce
  • 28. www.edureka.co/hadoop-adminSlide 28 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop 2.x Configuration Files – Apache Hadoop Core HDFS core-site.xml hdfs-site.xml yarn-site.xmlYARN mapred-site.xml Map Reduce
  • 29. www.edureka.co/hadoop-adminSlide 29 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Replication and Rack Awareness
  • 30. www.edureka.co/hadoop-adminSlide 30 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Hadoop Cluster Modes Hadoop can run in any of the following three modes: Fully-Distributed Mode Pseudo-Distributed Mode  No daemons, everything runs in a single JVM.  Suitable for running MapReduce programs during development.  Has no DFS.  Hadoop daemons run on the local machine.  Hadoop daemons run on a cluster of machines. Standalone (or Local) Mode
  • 31. www.edureka.co/hadoop-adminSlide 31 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions DEMO Hadoop Cluster Setup
  • 32. www.edureka.co/hadoop-adminSlide 32 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions DEMO Hadoop Rack Awareness
  • 33. www.edureka.co/hadoop-adminSlide 33 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions DEMO Secondary NameNode
  • 34. www.edureka.co/hadoop-adminSlide 34 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Commissioning and Decommissioning of DataNode DataNode Master Node DataNode DataNode DataNode DataNode DataNodeDataNode DataNode DecommissioningCommissioning
  • 35. Questions www.edureka.co/hadoop-adminSlide 35 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 36. www.edureka.co/hadoop-adminSlide 36 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Course Topics  Module 1 » Understanding Big Data » Hadoop Components  Module 2 » Different Hadoop Server Roles » Hadoop Cluster Configuration  Module 3 » Hadoop Cluster Planning » Job Scheduling  Module 4 » Securing your Hadoop Cluster » Backup and Recovery  Module 5 » Hadoop 2.0 New Features » HDFS High Availability  Module 6 » Quorum Journal Manager (QJM) » Hadoop 2.0 - YARN  Module 7 » Oozie Workflow Scheduler » Hive and Hbase Administration  Module 8 » Hadoop Cluster Case Study » Hadoop Implementation