2. What is new in hadoop 2.2 ?
• Update to the MapReduce framework to
Apache YARN
• MapReduce is a big feature in Hadoop—the
batch processor that lines up search jobs that
go into the Hadoop distributed file system
(HDFS) to pull out useful information. In the
previous version of MapReduce, jobs could
only be done one at a time, in batches,
because that's how the Java-based
MapReduce tool worked.
3. What is new in hadoop 2.2 ?
• Its will enable multiple search tools to hit the
data within the HDFS storage system at the
same time
• YARN does is divide the functionality of
MapReduce even further,
– JobTracker component—resource
management and job
– scheduling/monitoring—into separate
applications
4. What is new in hadoop 2.2 ?
• With MapReduce 2.0, developers can now
build apps directly within Hadoop, instead of
bolting them on from the outside, as many
third-party vendor tools have had to do in
Hadoop 1.0. This essentially will establish
Hadoop 2.0 as a platform into which
developers can create applications that will
search for an manipulate data far more
efficiently.
5. What is new in hadoop 2.2 ?
• YARN is the biggest change in the new
version of Hadoop,
– high availability for HDFS,
– HDFS snapshots
– support for the NFSv3 filesystem to access
data in HDFS
• Hadoop 2.2 is now officially supported on
Microsoft Window
8. Single node cluster setup
• Prerequisites:
–
–
–
Java 6 installed
Dedicated user for hadoop
SSH configured
• You can download tarball for hadoop 2.2 from
– http://mirror.metrocast.net/apache/hadoop/common/stable2/
– Extract it to a folder say, /home/hduser/yarn.
We assume dedicated user for Hadoop is
“hduser”.
•
9. Single node cluster setup
• After download the file justExtract it to a folder
say, /home/hadoop/yarn We assume
dedicated user for Hadoop is “hadoop”.
– $ tar -xvzf hadoop-2.2.0.tar.gz
– $ mv hadoop-2.2.0 /home/hadoop/yarn/hadoop2.2.0
– $ cd /home/hadoop/yarn
– $ sudo chown -R hadoop:hadoop hadoop-2.2.0
– $ sudo chmod -R 755 hadoop-2.2.0
10. Single node cluster setup
• Setup Environment Variables in ~/.bashrc
– export HADOOP_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0
– export HADOOP_MAPRED_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export HADOOP_COMMON_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export HADOOP_HDFS_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export YARN_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0
– export HADOOP_CONF_DIR=$HOME/Programs/Hadoop/hadoop2.2.0/etc/hadoop
• After Adding these lines at bottom of the
.bashrc file
– $ source ~/.bashrc
11. Single node cluster setup
• Create Hadoop Data Directories
# Two Directories for name node and datanode
– $ mkdir -p $HOME/yarn/yarn_data/hdfs/namenode
–
– $ mkdir -p $HOME/yarn/yarn_data/hdfs/datanode
•
Configuration
– $ cd $YARN_HOME
– $ vi etc/hadoop/yarn-site.xml
– Edit the yarn-site.xml
12. Single node cluster setup
• Add the following contents inside
configuration tag
# etc/hadoop/yarn-site.xml .
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
13. Single node cluster setup
• $ vi etc/hadoop/core-site.xml
• Add the following contents inside
configuration tag
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
14. Single node cluster setup
• $ vi etc/hadoop/hdfs-site.xml
• Add the following contents inside configuration tag
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/yarn/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/yarn/yarn_data/hdfs/datanode</value>
</property>
15. Single node cluster setup
• $ vi etc/hadoop/mapred-site.xml
• If this file does not exist, create it and paste
the content provided below:
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
18. Single node cluster setup
• Running Word count Example Program
$ mkdir input
$ cat > input/file
This is word count example
using hadoop 2.2.0
• Add input directory to HDFS
$ bin/hadoop hdfs -copyFromLocal input /input
19. Single node cluster setup
• Run wordcount example jar provided in
HADOOP_HOME:
$ bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduceexamples-2.2.0.jar wordcount /input /output
• Check the output:
$ bin/hadoop dfs -cat /out/*
This 2
Another 1
is 2
line 1
one 2
20. Single node cluster setup
• Web interface
• Browse HDFS and check health using
http://localhost:50070 in the browser:
21. Single node cluster setup
• You can check the status of the applications
running using the following
URL:http://localhost:8088
•