하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop

하둡 알아보기
백승용
2016/09/09
© 2016 NetApp, Inc. All rights reserved.1

Agenda
1. 하둡 개요
2. 하둡 기본 구성 – 3 노드 구성
3. 하둡 샘플 테스트(MapReduce)
 WordCount, TeraGen, TeraSort, TeraValidate
4. NetApp 하둡 커넥터 구성

Subtitle text placeholder
하둡 개요

하둡 개요
Apache Hadoop 이란
What Is Apache Hadoop?
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data
sets across clusters of computers using simple programming models. It is designed to scale up from single
servers to thousands of machines, each offering local computation and storage. Rather than rely on
hardware to deliver high-availability, the library itself is designed to detect and handle failures at the
application layer, so delivering a highly-available service on top of a cluster of computers, each of which
may be prone to failures.
The project includes these modules:
 Hadoop Common: The common utilities that support the other Hadoop modules.
 Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput
access to application data.
 Hadoop YARN: A framework for job scheduling and cluster resource management.
 Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

하둡 개요
Apache Hadoop 연관 프로젝트
Other Hadoop-related projects at Apache include:
 Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes
support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari
also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive
applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
 Avro™: A data serialization system.
 Cassandra™: A scalable multi-master database with no single points of failure.
 Chukwa™: A data collection system for managing large distributed systems.
 HBase™: A scalable, distributed database that supports structured data storage for large tables.
 Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
 Mahout™: A Scalable machine learning and data mining library.
 Pig™: A high-level data-flow language and execution framework for parallel computation.
 Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming
model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph
computation.
 Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and
flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is
being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial
software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.
 ZooKeeper™: A high-performance coordination service for distributed applications.

빅 데이터 플랫폼??
데이터 저장,처리, 관리 데이터 분석, 시각화데이터 수집, 통합, 정제
데이터를 분석하고 사용자가
사용할 수 있는 형태로 가공하는
기술
 분석, 시각화
 R, SAS, SPSS, Tableau, Fusion Tables,
Gephi, Tag Cloud등
 마이닝, 알고리즘
 텍스트 마이닝, 오피니언 마이닝,
리얼리티 마이닝, 군집화, 그래프
마이닝, SNS 분석, 머신 러닝,
Mahout, NLTK, OpenNLP, BolierPipe,
WEKA등
통합된 데이터를 저장하고
분산처리 및 관리하는 기술
 NoSQL
 HBase, DynamoDB, MongoDB,
CouchDB, Cassandra, Hypertable,
Riak, Redis, Voldermort
 처리(분산, 배치, 실시간등), 관리
 Hadoop(MapReduce), Ambari,
Spark, Storm ZooKeeper, Pig, Hive,
Mrjob, Azkaban, Oozie, Solr,
ElastricSearch, Cascading,
Cascalog등
 파일시스템
 HDFS, S3, NFS, GPFS등
정형, 반정형, 비정형의 소스
데이터를 분석 시스템으로
통합하고 분석에 용이한 형태로
가공하는 기술
 Flume, Chukwa, Scribe – 로그 수집
 SQOOP(SQL to HADOOP) – RDBMS와
NoSQL의 연동
 Nutch – 웹 크롤링
 Kafka – 메시지 전송 및 수집
 OpenRefine – 대용량 데이터 정제
 Thrift – 비정형 데이터 정형화 및 관리
 Avro – 데이터 직렬화 등

Cloudera
CDH – Cloudera Hadoop
CDH is the most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH delivers the
core elements of Hadoop – scalable storage and distributed computing – along with a Web-based user interface and
vital enterprise capabilities. CDH is Apache-licensed open source and is the only Hadoop solution to offer unified batch
processing, interactive SQL and interactive search, and role-based access controls.

Hortonworks
HDP – Hortonworks Data Platform
HDP is the industry's only true secure, enterprise-ready open source Apache™ Hadoop® distribution based on a
centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer
applications and delivers robust analytics that accelerate decision making and innovation.

MAPR
MapR Converged Data Platform
The MapR Converged Data Platform integrates Hadoop and Spark with real-time database capabilities, global event
streaming, and scalable enterprise storage to power a new generation of big data applications. The MapR Platform
delivers enterprise grade security, reliability, and real-time performance while dramatically lowering both hardware and
operational costs of your most important applications and data.

Hadoop Echosystem
https://www.mapr.com/products/open-source-engines
http://blrunner.com/99

하둡 기본 구성

목표 구성도 – 3 노드 구성
 hostname: hadoop01
 OS: CentOS 7
 가상메모리: 1GB
 IP: 192.168.2.191
 Role:
Master NameNode, DataNode
 Virtual Machine
 OS: CentOS 7
 IP: 192.168.2.192
 Role:
Secondary NameNode, DataNode
 Virtual Machine
 OS: CentOS 7
 IP: 192.168.2.193
 Role:
DataNode
 Virtual Machine
 DOT: 8.3.2
 SVM: hadoop
 NFS IP: 192.168.2.194
 ONTAP Simulator
 테스트 환경: VMWare Workstation

하둡 구성 수순 개요
1. OS 설치 2. 사용자 추가 (3. JAVA 설치) 4. OS 환경 설정
CentOS 기본 OpenJDK 또는
별도 설치도 가능(Optional)
5. 하둡 설치
6. 하둡 구성 파일
수정
7. 하둡 복사
8. HDFS 생성 및
결과 확인
NameNode 에서 수행 NameNode 에서 수행 NameNode  DataNode 로 복사 NameNode 에서 수행
OS 설치 시에 추가 가능
(hadoop 사용자 추가)
1. /etc/hosts
2. .bashrc
3. ssh 접속 환경 구성
4. 방화벽 해제
1. Basic Server with GUI
2. JAVA

1. OS 설치 및 2. 사용자 추가
1. VMware 또는 물리 서버 환경
- Basic Server with GUI와 JAVA 설치
2. OS 설치 시에 또는 OS 설치 후에 사용자 추가
# useradd hadoop
# passwd hadoop
3. 간혹, 일부 JAVA 패키지가 없을 경우 수동 설치
# rpm -qa |grep java-1.x.0
# yum install java-1.8.0-openjdk-devel.x86_64
4. 필요시에 OS 업데이트
# yum update
5. VMware인 경우, Clone 활용 가능  Clone 후에, “별첨 1. 기타
리눅스 설정”을 참고하여 호스트명과 IP만 수정

3. JAVA 설치
1. 하둡은 자바 기반의 프레임 워크로 자바 설치는 필수
2. 오라클 자바와 리눅스에 보통 기본 탑재되는 OpenJDK와
테스트가 되었음
3. CentOS 설치 시에, JAVA 패키지 선택하면 기본 설치 됨
4. 필요시, 별도 오라클 자바 설치후에 OS 환경변수 설정
5. 하둡 2.6.x 는 JAVA 6까지만 지원되며, 2.7.x가 JAVA 7
이상 지원

4. OS 환경 설정 – /etc/hosts, $HOME/.bashrc
1. 아래의 구성 파일 수정
/etc/hosts  root로 수행
$HOME/.bashrc  hadoop 사용자로 수행
2. “별첨 2. OS 환경 설정 및 하둡 구성 파일”의 첨부 파일 참고
3. 3개의 노드 모두 수행

4. OS 환경 설정 – ssh 접속 환경 구성
[hadoop@hadoop01 ~]$ ssh-keygen -t rsa  계속 엔터, 암호 물어보면 그냥 엔터
 수행 결과로 $HOME/.ssh 디렉토리에 id_rsa, id_rsa.pub 파일이 생김
[hadoop@hadoop01 ~]$ ssh-copy-id hadoop@hadoop01  yes, 암호 물어보면 한번 입력
[hadoop@hadoop01 ~]$ ssh-copy-id hadoop@hadoop02
[hadoop@hadoop01 ~]$ ssh-copy-id hadoop@hadoop03
 ssh-copy-id 수행 결과로 각 노드의 $HOME/.ssh 디렉토리에 authorized_keys 파일이 생김
 모든 노드에서 다른 노드로 동일하게 수행
아래와 같이 수행했을 때에, 암호 물어보지 않고 바로 로그인 가능해야 정상
[hadoop@hadoop01 ~]$ ssh hadoop01

4. OS 환경 설정 – 방화벽 해제
1. 아래와 같이 방화벽 설정 확인 및 disable
[root@hadoop01 ~]# systemctl status firewalld
[root@hadoop01 ~]# systemctl stop firewalld
[root@hadoop01 ~]# systemctl disable firewalld
2. 3개 노드 모두에서 실행
3. firewalld가 수행되고 있으면, 하둡 클러스터 구성이 불가능(하둡에서 사용하는 포트 사용 불가능)

5. 하둡 설치
1. NameNode에서 하둡 설치
2. 하둡 설치는 패키지 설치가 아니라, 압축 파일만 해제하면 됨
3. 하둡 홈페이지에서 최신 바이너리 버전 다운로드하여 scp등으로 NameNode에 업로드 후에 4번 수행
- http://hadoop.apache.org/releases.html
4. 또는, NameNode 에서 wget으로 다운로드하고, 압축해제 및 링크 생성
[hadoop@hadoop01 ~]$ wget http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-2.7.2/hadoop-
2.7.2.tar.gz
[hadoop@hadoop01 ~]$ tar xvzf hadoop-2.7.2.tar.gz
[hadoop@hadoop01 ~]$ ln -s hadoop-2.7.2 hadoop

6. 하둡 구성 파일 수정 및 7. 하둡 복사
1. 아래의 구성 파일 수정
/home/hadoop/hadoop/etc/hadoop/core-site.xml
/home/hadoop/hadoop/etc/hadoop/hdfs-site.xml
/home/hadoop/hadoop/etc/hadoop/mapred-site.xml
/home/hadoop/hadoop/etc/hadoop/yarn-site.xml
/home/hadoop/hadoop/etc/hadoop/slaves
2. “별첨 2. OS 환경 설정 및 하둡 구성 파일”의 첨부 파일 참고
3. 하둡 바이너리 복사(NameNode  DataNode, hadoop01  hadoop02, hadoop03)
[hadoop@hadoop01 ~]$ scp -r hadoop-2.7.2 hadoop@hadoop02:~
[hadoop@hadoop01 ~]$ scp -r hadoop-2.7.2 hadoop@hadoop03:~
[hadoop@hadoop01 ~]$ ssh hadoop@hadoop02 "ln -s hadoop-2.7.2 hadoop"
[hadoop@hadoop01 ~]$ ssh hadoop@hadoop03 "ln -s hadoop-2.7.2 hadoop"

8. HDFS 생성 및 결과 확인
1. 하둡 NameNode 포맷
[hadoop@hadoop01 ~]$ hdfs namenode -format
2. 하둡 클러스터 실행
[hadoop@hadoop01 ~]$ start-dfs.sh && start-yarn.sh
[hadoop@hadoop01 ~]$ mr-jobhistory-daemon.sh start historyserver(JOB history 확인용으로
Optional)
3. 클러스터 및 파일 시스템 확인
[hadoop@hadoop01 ~]$ hdfs dfsadmin -report
[hadoop@hadoop01 ~]$ hadoop fs -df -h
[hadoop@hadoop01 ~]$ jps  각 노드에서 수행해 보면, 각 노드별 수행 역할 확인 가능

8. HDFS 생성 및 결과 확인
3. 클러스터 및 파일 시스템 확인
 WEB GUI 확인: http://192.168.2.191:50070
 Resource Manager GUI 확인: http://192.168.2.191:8088
 Job History GUI: http://192.168.2.191:19888/jobhistory

WordCount, TeraGen, TeraSort, TeraValidate
하둡 샘플 테스트(MapReduce)

WordCount 예제
[hadoop@hadoop01 ~]$ hadoop fs -mkdir /wc_input
[hadoop@hadoop01 ~]$ cd hadoop ; ls
[hadoop@hadoop01 hadoop]$ cat LICENSE.txt
[hadoop@hadoop01 hadoop]$ hadoop fs -copyFromLocal LICENSE.txt /wc_input
[hadoop@hadoop01 hadoop]$ hadoop fs -ls / ; hadoop fs -ls /wc_input
[hadoop@hadoop01 hadoop]$ cd $HOME/hadoop/share/hadoop/mapreduce
[hadoop@hadoop01 mapreduce]$ pwd ; ls
[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar  mapreduce 예제 목록 확인
[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /wc_input
/wc_output
[hadoop@hadoop01 mapreduce]$ hadoop fs -ls /wc_output
[hadoop@hadoop01 mapreduce]$ hadoop fs -cat /wc_output/part-r-00000
 작업 진행 및 결과등은 Resource Manager 및 JOB History WEB GUI에서 확인 가능 또는 yarn 명령어로 확인 가능

HDFS의 실체

TeraGen, TeraSort, TeraValidate
1. TeraGen, TeraSort, TeraValidate는 기본 하둡에 포함된, 범용적인 벤치마크 툴
 teragen: Generate data for the terasort
 terasort: Run the terasort
 teravalidate: Checking results of terasort
2. NetApp FAS NFS Connector for Hadoop(TR-4382 )에서 성능 테스트에 사용됨
[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar teragen 10000 /teragen
[hadoop@hadoop01 mapreduce]$ hadoop fs -ls /teragen
[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar terasort /teragen /terasort
[hadoop@hadoop01 mapreduce]$ hadoop fs -ls /terasort
[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar terasort /terasort /teravalidate
[hadoop@hadoop01 mapreduce]$ hadoop fs -ls /teravalidate
 작업 진행 및 결과등은 Resource Manager 및 JOB History WEB GUI에서 확인 가능 또는 yarn 명령어로 확인 가능

NetApp 하둡 커넥터 구성

NetApp scalable Hadoop deployment
Network ArchitectureE-Series and FAS arrays for Hadoop

Hadoop in the Enterprise
빅 데이터 분석이 일반화가 되고 있으나, 그 앞에는 많은 도전 과제가 있다. 현재 운영중인 분석 시스템에 바로 하둡을 적용하기도 어렵다.
1. Enterprises have storage and compute imbalance.
 기업 각각의 환경에서는 컴퓨팅과 스토리지의 불균형이 존재하지만, 전형적인 하둡 시스템은 분리가 불가능하다.
 분리형 시스템(decoupled design)은, 컴퓨팅과 스토리지의 독립적인 확장이 가능하다.
2. Enterprises have existing hardware.
 대부분 기존의 하드웨어와 데이터를 보유하고 있는데, 이를 활용하기 위해서는 데이터를 분석 시스템으로 이관이 필요하다.
그러나 분리형 시스템(decoupled design) 에서는, 기존 하드웨어와 데이터를 이용하여 분석이 가능하다.
3. Analytics storage JBOD is not efficient.
 하둡에서의 JBOD는 가용성과 성능을 위해서 비효율적인 3벌 복제를 사용한다.
 넷앱은 RAID-DP로 효율화가 가능하다.
4. Analytics storage JBOD lacks data management.
 하둡과 같은 분석 시스템에서 사용되는 파일시스템은 중복제거, 고가용성 재해복구 등과 같은 기능들이 부족하다.
 넷앱 하둡 커넥터는, 데이터 노드와 무관하게 분석용 용량을 추가할 수 있다.
 커넥터는 기존 스토리지 시스템의 데이터를 분산처리를 가능하게 해준다.
 기본 HDFS와 넷앱 NFS를 동시에 사용이 가능하다.
 Snapshot, FlexClone등의 기술을 적용할 수 있다.

Benefits and Use Cases, Deployment Options, Ease of Deployment
넷앱 하둡 NFS Connector 는 분리형 시스템(decoupled design)으로, 기존 하둡 시스템보다 높은 기능성을 제공할 수 있다.
1. Analyzing data on enterprise storage. – 하둡 커넥터는 기존의 스토리지의 데이터를, 분석을 위한 데이터 수집 과정없이 바로 분석할
수 있다. 즉, 단일 스토리지로 운영 데이터와 분석 데이터에 대한 서비스를 해줄 수 있다.
2. Cross data–center deployments. – 분리형 시스템이므로 데이터를 분산하여 저장하고 독립적인 확장이 가능하다. 또한, NPS 같은
넷앱 솔루션을 이용하여, 클라우드 컴퓨팅을 활용할 수 있다.
3. Analyze data on existing NFS storage.
4. Build testing and QA environments by using clone of existing data. – FlexClone 을 이용하여, 또 다른 용도의 데이터 셋을 즉시
생성하여 활용할 수 있다.
5. Leverage storage-level caching for iterative machine learning algorithms. – 반복적인 머신러닝과 같은 알고리즘은 캐시
친화적이므로, FlashCache를 통한 성능 가속화가 가능하다.
6. Use a backup site for analytics. – NPS 를 사용할 경우, 클라우드 자원을 분석에 활용할 수 있다.
7. Deployment Options – HDFS+NFS 또는 NFS의 두 가지 형태 모두 사용이 가능하다.
8. Ease of Deployment – 하둡 커넥터는 JAR 압축 파일의 복사와 설정 파일의 수정만으로 손 쉽게 구현이 가능하다.

Technical Advantages
1. The connector works with Apache Hadoop, Apache Spark, Apache HBase, and Tachyon.
2. No changes are needed to existing applications.
3. No changes are needed to existing deployments; only configuration files are modified (core-site.xml,
hbase-site.xml, and so on).
4. Data storage can be modified and upgraded nondestructively by using clustered Data ONTAP.
5. The connector supports the latest networks (10GbE) and multiple NFS connections.
6. The connector enables high-availability and nondisruptive operations by using clustered Data ONTAP.

NetApp NFS Connector for Hadoop plugs into Apache Hadoop
1. Connection Pool
 여러 노드와 멀티 링크 사용 가능
2. File Handle Cache
 LRU 캐싱 활용 가능
3. NFS InputStream – 하둡 노드에서 읽기 작업
 Large sequential reads. – nfsReadSizeBits
 Multiple outstanding I/Os.
 Prefetching. – nfsSplitSizeBits
4. NFS OutputStream – 하둡 노드에서 쓰기 작업
 write buffer – nfsWriteSizeBits
 all write requests only when the output
stream is closed
5. Authentication
 none or UNIX – nfsAuthScheme

NetApp FAS NFS Connector for Hadoop 다운로드 및 복사
1. NetApp FAS NFS Connector for Hadoop 다운로드
 https://github.com/NetApp/NetApp-Hadoop-NFS-Connector/releases
 hadoop-nfs-connector-1.0.6.jar
 hadoop-nfs-3.0.0-SNAPSHOT.jar
2. 모든 노드에 해당 jar 업로드 및 복사
 중요한 점은 hadoop의 classpath를 확인하고 복사할 것
[hadoop@hadoop01 ~]$ hadoop classpath
[hadoop@hadoop01 common]$ pwd
/home/hadoop/hadoop/share/hadoop/common
[hadoop@hadoop01 common]$ scp hadoop-nfs-3.0.0-SNAPSHOT.jar hadoop-nfs-connector-1.0.6.jar
hadoop@hadoop02:/home/hadoop/hadoop/share/hadoop/common

FAS 스토리지 옵션 변경
Cluster832::> vserver nfs modify -vserver hadoop -nfs-rootonly disabled
Cluster832 ::> vserver nfs modify -vserver hadoop -mount-rootonly disabled
Cluster832 ::> set advanced
Warning: These advanced commands are potentially dangerous; use them only when directed to do so
by NetApp personnel.
Do you want to continue? {y|n}: y
Cluster832 ::*> vserver nfs modify -vserver hadoop -v3-tcp-max-read-size 1048576
Cluster832 ::*> vserver nfs modify -vserver hadoop -v3-tcp-max-write-size 65536
Cluster832 ::*>
 DOT 8.3.2에서 -v3-tcp-max-read-size, -v3-tcp-max-write-size 옵션은 DEPRECATED로 표기됨
(DEPRECATED)-NFSv3 TCP Maximum Read Size (bytes): 1048576
(DEPRECATED)-NFSv3 TCP Maximum Write Size (bytes): 65536

NFS 구성 파일 작성 및 하둡 구성 파일 수정
1. nfs-mapping.json 파일 작성 및 업로드
 모든 노드의 하둡 설정 파일 디렉토리에 업로드
 /home/hadoop/hadoop/etc/hadoop
2. core-site.xml 구성 파일 수정
 모든 노드의 하둡 설정 파일 수정
 넷앱 하둡 커넥터 부분 추가
{
"spaces": [
{
"name": "DOT832",
"uri": "nfs://192.168.2.194:2049/",
"options": {
"nfsExportPath": "/hadoop",
"nfsReadSizeBits": 20,
"nfsWriteSizeBits": 20,
"nfsSplitSizeBits": 30,
"nfsAuthScheme": "AUTH_SYS",
"nfsUsername": "root",
"nfsGroupname": "root",
"nfsUid": 0,
"nfsGid": 0,
"nfsPort": 2049,
"nfsMountPort": -1,
"nfsRpcbindPort": 111
},
"endpoints": [
{
"host": "nfs://192.168.2.194:2049/",
"exportPath": "/hadoop",
"path": "/"
},
]
}
]
}

NFS 볼륨에 데이터 업로드 테스트
[hadoop@hadoop01 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@hadoop01 hadoop]$ ls
[hadoop@hadoop01 hadoop]$ hadoop fs -ls nfs://192.168.2.194:2049/
[hadoop@hadoop01 hadoop]$ hadoop fs -copyFromLocal *.txt nfs://192.168.2.194:2049/
Store with ep Endpoint: host=nfs://192.168.2.194:2049/ export=/hadoop path=/ has fsId 2147888298
Found 1 items
drwxrwxrwx - 0 0 4096 2016-08-31 11:05 nfs://192.168.2.194:2049/.snapshot
[hadoop@hadoop01 hadoop]$ hadoop fs -copyFromLocal *.txt nfs://192.168.2.194:2049/
Store with ep Endpoint: host=nfs://192.168.2.194:2049/ export=/hadoop path=/ has fsId 2147888298
16/08/31 11:45:35 WARN stream.NFSBufferedOutputStream: Flushing a closed stream. Check your code.
16/08/31 11:45:35 INFO stream.NFSBufferedOutputStream: STREAMSTATSstreamStatistics:
STREAMSTATS name: class org.apache.hadoop.fs.nfs.stream.NFSBufferedInputStream/LICENSE.txt._COPYING_
STREAMSTATS streamID: 1
STREAMSTATS ====OutputStream Statistics====
……………………….. 생략…………………………..
[hadoop@hadoop01 hadoop]$ hadoop fs -ls nfs://192.168.2.194:2049/

NFS 볼륨에 TeraGen, TeraSort, TeraValidate 테스트
[hadoop@hadoop01 mapreduce]$ pwd
/home/hadoop/hadoop/share/hadoop/mapreduce
[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar teragen 100 nfs://192.168.2.194:2049/teragen
[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar terasort nfs://192.168.2.194:2049/teragen
nfs://192.168.2.194:2049/terasort
[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar teravalidate nfs://192.168.2.194:2049/terasort
nfs://192.168.2.194:2049/teravalidate
[hadoop@hadoop01 mapreduce]$ hadoop fs -ls nfs://192.168.2.194:2049/

감사합니다.

별첨 1. 기타 리눅스 설정
hostname 설정

별첨 1. 기타 리눅스 설정
IP 주소 변경

별첨 2. OS 환경 설정 및 하둡 구성 파일
1. /home/hadoop/.bashrc
2. /etc/hosts
3. /home/hadoop/hadoop/etc/hadoop/core-site.xml
4. /home/hadoop/hadoop/etc/hadoop/hdfs-site.xml
5. /home/hadoop/hadoop/etc/hadoop/mapred-site.xml
6. /home/hadoop/hadoop/etc/hadoop/yarn-site.xml
7. /home/hadoop/hadoop/etc/hadoop/slaves

별첨 2. OS 환경 설정 및 하둡 구성 파일
5. /home/hadoop/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
6. /home/hadoop/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
7. /home/hadoop/hadoop/etc/hadoop/slaves
hadoop01
hadoop02
hadoop03
1. /home/hadoop/.bashrc
export JAVA_HOME=/usr/lib/jvm/java
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
2. /etc/hosts
192.168.2.191 hadoop01
192.168.2.192 hadoop02
192.168.2.193 hadoop03
3. /home/hadoop/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
</configuration>
4. /home/hadoop/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop02:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>

하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to 하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop

Similar to 하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop (20)

More from SeungYong Baek

More from SeungYong Baek (6)

하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop