Hadoop Operations

Apache Hadoop
Operations for
Production Systems
Greg Phillips, Jake Miller
Cloudera Technology Day

2© Cloudera, Inc. All rights reserved.
Your Hosts
Greg
Phillips
Jake
Miller

Greg Phillips
• Solutions Architect
• gphillips@clouderagovt
Jake Miller
• Customer Operations
Engineer
• jakem@cloudera
$ whoami

Agenda
• Intro
• Installation & Configuration
• Troubleshooting
• Enterprise Considerations

Machine
JVM
Linux
Disk, CPU, Mem
Machine
JVM
Linux
Disk, CPU, Mem
Machine
JVM
Linux
Disk, CPU, Mem
Machine
JVM
Linux
Disk, CPU, Mem
The Hadoop Stack: JVMs

Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
The Hadoop Stack: Daemons

Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
The Hadoop Stack: Storage + Coordination
Random Access Storage: HBase
File Storage: HDFS
Coordination:
ZooKeeper;
Security: Sentry

Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
The Hadoop Stack: Execution
File Storage: HDFS
Coordination:
ZooKeeper;
Security: Sentry
In Mem processing:
Spark
Interactive SQL:
ImpalaBatch processing
MapReduce
Search: Solr
Resource Management: YARN

Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
The Hadoop Stack: Languages + UI
File Storage: HDFS
Coordination:
ZooKeeper;
Security: Sentry
In Mem processing:
Spark
Interactive SQL:
MapReduce
Search: Solr
Languages/APIs: Hive, Pig, Crunch, Kite,
Mahout
User interface: HUE

Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
In Mem processing:
Spark
Interactive SQL:
MapReduce
Coordination:
ZooKeeper;
Security: Sentry
Search: Solr
User interface: HUE
File Storage: HDFS
Mahout
The Hadoop Stack: Integration
Eventingest:
Flume,Kafka
DBImportExport:
Sqoop
Other
systems:
httpd, sas,
custom apps
etc

Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
In Mem processing:
Spark
Interactive SQL:
MapReduce
Coordination:
ZooKeeper;
Security: Sentry
Search: Solr
User interface: HUE
File Storage: HDFS
Mahout
The Hadoop Stack
Eventingest:
Flume,Kafka
DBImportExport:
Sqoop
Other
systems:
httpd, sas,
custom apps
etc

Apache Hadoop Operations
for Production Systems:
Installation
Greg Phillips

Hardware Considerations
• As a distributed system, Hadoop is going to be deployed onto multiple
interconnected hosts
• How large will the cluster be?
• What services will be deployed on the cluster?
• Can all services effectively run together on the same hosts or is some form of
physical partitioning required?
• What role will each host play in the cluster?
• This impacts the hardware profile (CPU, Memory, Storage, etc)
• How should the hosts be networked together?

Host Roles
within a
Cluster
Master Node
HDFS NameNode
YARN ResourceManager
HBase Master
Impala StateStore
ZooKeeper
Worker Node
HDFS DataNode
YARN NodeManager
HBase RegionServer
Impalad
Utility Node
Relational Database
Management (eg: CM)
Hive Metastore
Oozie
Impala Catalog Server
Edge Node
Gateway Configuration
Client Tools
Hue
HiveServer2
Ingest (eg: Flume)
For larger clusters, roles
will be spread across
multiple nodes of a
given type

Roles vs Cluster Size
Master Worker Utility Edge
Very Small
(≤10)
1 ≤10 1 shared Host
Small (≤20) 2 ≤20 1 shared Host
Medium (≤200) 3 ≤200 2 1+
Large (≤500) 5 ≤500 2 1+

Host Hardware Configuration
• CPU
• There’s no such thing as too much CPU
• Jobs typically do not saturate their cores, so raw clock speed is not at a
premium
• Cost and Budget are the major factors here
• Memory
• You really don’t want to overcommit and swap
• Java heaps should fit into physical RAM with additional space for OS and non-
Hadoop processes

Host Configuration (cont.)
• Disk
• More spindles == More I/O capacity
• Larger drives == lower cost per TB
• More hosts with less capacity increases parallelism and decreases re-replication
costs when replacing a host
• Fewer hosts with more capacity generally means lower unit cost
• Rule of thumb: One disk per two configured YARN vcores
• Lower latency disks are generally not a good investment, except for specific
use-cases where random I/O is important

Operating System Configuration
• Turn off IPTables (or any other firewall tool)
• Turn off SELinux
• Turn down swappiness to 10 or less
• Turn off Transparent Huge Page Compaction
• Use a network time source to keep all hosts in sync (also timezones!)
• Make sure forward and reverse DNS work on each host to resolve all other hosts
consistently
• Use the Oracle JDK - OpenJDK is subtly different and may lead you to grief

Sanity Testing
• Use the basic sanity tests provided by each service
• Submit example jobs: pi, sleep, teragen/terasort
• Work with sample tables in Hive/Impala
• Use Hue to do these things if your users will
• Repeat these tests when turning on security/authentication/authorization
mechanisms
• Make sure they succeed for the expected users and fail for others
• Cloudera Manager provides some ‘canary’ tests for certain services
• HDFS create/read/write/delete
• Hbase, Zookeeper, etc

Questions?

Configuration
Greg Phillips

Agenda
• Mechanics
• Key Configurations
• Configurations as Living Documents

What is Configuration?
• Everything to enable the cluster to function
• Individual Hadoop configuration files eg. hdfs-site.xml
• Where should processes run?
• Process configuration files, environment variables, command line arguments
• Which configuration changes require restart?
• What kind of restart?

Why use a Tool?
• Require a meta configuration language
• How do you describe where processes should run?
• Lots of configurations to manage
• Scoping of configurations
• Some configurations apply globally, others locally

Hadoop XML
• Example: /etc/hadoop/conf/hdfs-site.xml
• Configuration inheritance
• Key value pairs representation
• Many many of these files!
<configuration>
<property>
<name>dfs.client.mmap.cache.size</name>
<value>256</value>
</property>
</configuration>

Editing a Configuration Demo!

If Demo Failed…
Step One: edit a config
Step Two: save

If Demo Failed…
Step Three: at top-level, note that restart is needed
Step Four: review changes
Step five: restart

Verifying Configuration
• <namenode_host>:50070/conf

Managing Configuration Files
• Management tool will put configurations somewhere
• Eg. core-site.xml, hdfs-site.xml, dfs_hosts_allow, dfs_hosts_exclude, hbase-
site.xml, hive-env.sh, hive-site.xml, hue.ini, mapred-site.xml, oozie-site.xml, yarn-
site.xml, zoo.cfg (and so on)
• Often they are deployed in private locations
• Good idea for files to be immutable

Managing Configuration Files Demo!

If Demo Failed…
• e.g., /var/run/cloudera-scm-agent/process/*-NAMENODE

Key Static Configuration
• Security
• Ports
• Local Storage
• JVM
• Databases

Networking
• Does your DNS work?
• Like, forwards and backwards?
• For sure?
• Have you really checked?

HDFS Configurations of Note
• Local Data Directories
• dfs.datanode.data.dir: one per spindle; avoid RAID
• dfs.namenode.name.dir: two copies of your metadata better than one
• High Availability
• Requires more daemons
• Two failover controllers co-located with namenodes
• Three journal nodes

Resource Management in YARN
• We want to prefer one multi tenant cluster over multiple clusters
• Marketing vs. Engineering cluster resources
• The “Fair Scheduler” implements the Dominant Resource Fairness Algorithm
• Configured by an “allocations.xml” file
• Possible to police resource usage through cgroups

Yarn Configurations of Note
• Yarn doles out resources to applications across two axes: memory and CPU
• Define per-NodeManager resources
• yarn.nodemanager.resource.cpu-vcores
• yarn.nodemanager.resource.memory-mb
• Some guardrails:
• yarn.scheduler.maximum-allocation-vcores
• yarn.scheduler.maximum-allocation-mb

Yarn Configurations of Note
• Can any other job fit?
1GB
1GB
1GB
1GB
1
core
1
core
1
core
1
core
1
core
1
core
🍅 1 GB, 1 core
🎃 2 GB, 2 cores
🍪 1 GB, 2 cores
🍅
🍪 🍪
🎃

Configurations as Living Documents
• Configuration is sometimes workload dependent
• Leverage your metric system
Configs Metrics

Heap Sizing
• Heap sizes:
• Datanodes: linear in # of blocks
• Namenode: linear in # of blocks and # of files

Resource management
• Fannie and Freddie are sharing the cluster
• How do we configure the weights?

In Summary
• Book keeping of cluster configuration requires a tool
• Make sure key configurations are set properly
• Configuration files are living documents – use your metric system!

4
Troubleshooting
Jake Miller

4
Troubleshooting
Managing Hadoop Clusters
Troubleshooting Hadoop Systems
Debugging Hadoop Applications

4
Troubleshooting

4

4
Other
systems:
httpd, sas,
custom apps
etc
Tools for lots of machines
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem

4
Other
systems:
httpd, sas,
custom apps
etc
Tools for lots of machines
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Ganglia, Nagios
Ganglia*, Nagios*

4
Ganglia: Monitoring
• Collects and aggregates metrics from machines
• Good for what’s going on right now and describing normal perf
• Organized by physical resource (CPU, Mem, Disk)
– Good defaults
– Good for pin pointing machines
– Good for seeing overall utilization
– Uses RRDTool under the covers
• Some data scalability limitations, lossy over time.
• Dynamic for new machines, requires config for new metrics

5
Graphite: Monitoring
• Popular alternative to Ganglia
• Can handle the scale of metrics coming
in
• Similar to Ganglia, but uses its own RRD
database.
• More aimed at dynamic metrics (as
opposed to statically defined metrics)

5
Nagios: Alerting
•Provides alerts from canaries and basic health checks for services on
machines
•Organized by Service (e.g. HTTP, POP, SSH, SMTP)
•Defacto standard for service monitoring
•Lacks distributed system know how
•Requires bespoke setup for service slaves and masters
•Lacks details with multi-tenant services or short-lived jobs

5
Tools for the Hadoop Stack
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
In Mem
processing: Spark
Interactive SQL:
MapReduce
Coordination:
zookeeper;
Security: Sentry
Search: Solr
Eventingest:
Flume,Kafka
DBImportExport:
Sqoop
User interface: HUE
Other systems:
httpd, sas,
custom apps
etc
File Storage: HDFS
Mahout
Ganglia, Nagios
Ganglia*, Nagios*

5
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
In Mem
processing:
Spark
Interactive SQL:
ImpalaBatch
processing
MapReduce
Coordination:
zookeeper;
Security: Sentry
Search: Solr
Eventingest:
Flume,Kafka
DBImport
Export:Sqoop
User interface: HUE
Other
systems:
httpd, sas,
custom apps
etc
File Storage: HDFS
Languages/APIs: Hive, Pig, Crunch,
Kite, Mahout
Logs and Metrics
Logs and
Metrics
Logs and
Metrics
Logs and
Metrics
Logs and
Metrics
Logs and Metrics
Logs
Logs and
Metrics
Logsand
Metrics
Logs
Logs and Metrics
Logs and Metrics
HTrace
HTrace
Custom
app logs
/proc, dmesg, /var/logs/
jmx, jstack, jstat, GC logs

5
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
In Mem
processing:
Spark
Interactive SQL:
ImpalaBatch
processing
MapReduce
Coordination:
zookeeper;
Security: Sentry
Search: Solr
Eventingest:
Flume,Kafka
DBImport
Export:Sqoop
User interface: HUE
Other
systems:
httpd, sas,
custom apps
etc
File Storage: HDFS
Kite, Mahout
Ganglia, Nagios
Ganglia*, Nagios*
Ganglia*, Nagios* ?

5
Ganglia and Nagios are not enough
• Fault tolerant distributed system masks many problems (by design!)
– Some failures are not critical – failure condition more complicated
• Lacks distributed system know how
– Requires bespoke setup for service slaves and masters
– Lacks details with multi-tenant services or short-lived jobs
• Hadoop services are logically dependent on each other
– Need to correlate metrics across different service and machines
– Need to correlate logs from different services and machines
– What about all the logs?
– What of all these metrics do we really need?
• Some data scalability limitations, lossy over time
– What about full fidelity?

5
OpenTSDB
• OpenTSDB (Time Series Database)
– Scalable storage to keep all your metrics
– Efficiently stores metric data into HBase
– Keeps data at full fidelity
– Keep as much data as your HBase instance
can handle.
• Free and Open Source

5
Cloudera Manager
• Extracts hardware, OS, and Hadoop service metrics
specifically relevant to Hadoop and its related services.
– Operation Latencies
– # of disk seeks
– HDFS data written
– Java GC time
– Network IO
• Provides
– Cluster preflight checks
– Basic host checks
– Regular health checks
• Uses LevelDB for underlying metrics storage
• Provides distributed log search
• Monitors logs for known issues
• Point and click for useful utils (lsof, jstack, jmap)
• Free

5
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
In Mem
processing:
Spark
Interactive SQL:
ImpalaBatch
processing
MapReduce
Coordination:
zookeeper;
Security: Sentry
Search: Solr
Eventingest:
Flume,Kafka
DBImport
Export:Sqoop
User interface: HUE
Other
systems:
httpd, sas,
custom apps
etc
File Storage: HDFS
Kite, Mahout
Ganglia, Nagios
Ganglia*, Nagios*
Full Fidelity metrics:
Open TSDB*Metrics + Logs:
Cloudera Manager
Logs:
Search a la Splunk or
Rocana

5
Troubleshooting
Managing a Hadoop Clusters

6
The Law of Cluster Inertia
A cluster in a good state stays in a good state,
and
a cluster in a bad state stays in a bad state,
unless
acted upon by an external force.

6
Admins and Users as an external force
Upgrades
• Linux
• Hadoop
• Java
Misconfiguration
• Memory Mismanagement
– JVM vs Container OOME
– NN OOME
• Thread Mismanagement
– Fetch Failures
– Replicas
• Disk Mismanagement
– No File
– Too Many Files

6
Understanding Normal
• Establish normal
– Boring logs are good
• So that you can detect the abnormal
– Find anomalies and outliers
– Compare performance
• And then isolate root causes
– Who are the suspects
– Pull the thread by interrogating
– Correlate across different subsystems

6
Basic tools
• What happened?
– Logs
• What state are we in now?
– Metrics
– Thread stacks
• Is it alive?
– listings
– Canaries
• Is it OK?
– fsck / hbck
• What happens when I do this?
– Tracing
– Dumps

6
Diagnostics for single machines
Machine
Linux
Disk, CPU, Mem
Analysis and Tools via @brendangregg

6
Diagnosis Tools
HW/Kernel
Logs /log, /var/log
dmesg
Metrics /proc, top, iotop,
sar, vmstat, netstat
Tracing Strace, tcpdump
Dumps Core dumps
Liveness ping
Corrupt fsck

6
Diagnostics for the JVM
• Most Hadoop services run in the Java VM, intros new java subsystems
– Just in time compiler
– Threads
– Garbage collection
• Dumping Current threads
– jstacks (any threads stuck or blocked?)
• JVM Settings:
– Enabling GC Logs: -XX:+PrintGCDateStamps -XX:+PrintGCDetails
– Enabling OOME Heap dumps: -XX:+HeapDumpOnOutOfMemoryError
• Metrics on GC
– Get gc counts and info with jstat
• Memory Usage dumps
– Create heap dump: jmap –dump:live,format=b,file=heap.bin <pid>
– View heap dump from crash or jmap with jhat
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem

6
GC Logs Are Your Friends
• GC Logging
– -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
-Xloggc:/var/log/hdfs/nn-hdfs.log
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=20M
• Great resource:
– https://stackoverflow.com/questions/895444/java-garbage-collection-log-messages

6
• jstack
– Use the same JDK
– Must be run as user of the
process
– Can force it with –F but that
abstracts away deadlock info
• kill -3 <PID>
– Dumps jstack to stdout
CM stores as separate logfiles
JVM Debugging Techniques: Hung Process

6
JVM Debugging Techniques: Hung Process

7
JVM Debugging Techniques: Heap Analysis
• jstat -gcutil <PID> 1s 120
– Checks for current GC activity
• jmap -histo:live <PID>
– Get a histogram of the current objects within the heap

7
Diagnosis Tools
HW/Kernel JVM
Logs /log, /var/log
dmesg
Enable gc logging
Metrics /proc, top, iotop, sar,
vmstat, netstat
jstat
Tracing Strace, tcpdump Debugger, jdb, jstack
Dumps Core dumps jmap
Enable OOME dump
Liveness ping Jps
Corrupt fsck

7
Diagnostics for the Hadoop Stack
• Single client call can trigger many RPCs spanning many machines
• Systems are evolving quickly
• A failure on one daemon, by design, does not cause failure of the entire service
• Logs:
• Each service’s master and slaves have their own logs: /var/log/
• There are lot of logs and they change frequently
• Metrics:
• Each daemon offers metrics, often aggregated at masters
• The hard part is figuring out which metrics matter to you
• Tracing:
• HTrace (integrated into Hadoop, HBase; currently in Apache Incubator)
• What strace did for system calls, HTrace does for RPC calls
• Liveness:
• Canaries, service/daemon web UIs

7
Hadoop Logs Are Your Friends
• Logs are verbose but necessary
– NameNode logs (500 node cluster):
• 10 files * 200MB/file = 2GB
• 2GB of logs span 3 hours
• 3 days of logs = ~48GB
• Retain enough logs for debugging & plan for worst case
– Adjust log retention as the cluster grows
– To properly analyze, you need to log enough
– You may only get one shot

7
• Just reduce the log level to save space? NO!
– INFO logging helps us understand what happened right before
failure
• E.g. who logged in, what job was running, what they were writing
to
• Application: Yarn containers write logs locally, then migrate to HDFS
– Ensure application log space is sufficient
– It’s a distributed system but make sure local system is configured
correctly
INFO Logs Are Your Friends

7
Hadoop Debugging Techniques: LogLevel
• Set the log level without process restarts
http://namenode.cloudera.com:50070/logLevel
• Scriptable Java servlet
http://namenode.cloudera.com:50070/logLevel?log=org&level=DEBUG

7
Metrics: HDFS
• HDFS is the core of the platform.
What’s important?
– Is the Standby NN
checkpointing?
– Are the NNs garbage
collecting?
– Percentage of heap used at
steady state?
Best to checkpoint every hour

7
Diagnosis Tools
HW/Kernel JVM Hadoop Stack
Logs /log, /var/log
dmesg
enable gc logging *:/var/log/hadoop|hbase|…
Metrics /proc, top, iotop, sar,
vmstat, netstat
jstat *:/stacks, *:/jmx,
Tracing strace, tcpdump debugger, jdb, jstack htrace
Dumps core dumps jmap
enable OOME dump
*:/dump
Liveness ping Jps Web UI
Corrupt fsck HDFS fsck, HBase hbck

7
Finding the right part of the stack
E-SPORE (from Eric Sammer’s Hadoop Operations)
• Environment
– What is different about the environment now from the last time everything worked?
• Stack
– The entire cluster also has shared dependency on data center infrastructure such as the network, DNS, and other services.
• Patterns
– Are the tasks from the same job? Are they all assigned to the same tasktracker? Do they all use a shared library that was changed
recently?
• Output
– Always check log output for exceptions but don’t assume the symptom correlates to the root cause.
• Resources
– Do local disks have enough? Is the machine swapping? Does the network utilization look normal? Does the CPU utilization look
normal?
• Event correlation
– It’s important to know the order in which the events led to the failure.

7
Troubleshooting

8
Example application pipeline with strict SLAs
ingest process export
time

8
Example Application Pipeline - Ingest
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: SentryFile Storage: HDFS
Custom app feeds
event data into HBase
Other
systems:
httpd, sas,
custom apps
etc
Eventingest:
Flume,Kafka

8
Example Application Pipeline - processing
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:
Flume,Kafka
Other
systems:
httpd, sas,
custom apps
etc
Batch
processing
MapReduce
Kite, Mahout
Map Reduce Job
generates new artifact
from HBase data and
writes to HDFS
File Storage: HDFS

8
Batch processing
MapReduce
Mahout
Example Application Pipeline - export
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:
Flume,Kafka
Random Access Storage: HBase 9
DBImportExport:
Sqoop
File Storage: HDFS Other systems:
httpd, sas,
custom apps
etc
Sqoop result into
relational database
to serve application

8
Case study 1: slow jobs after Hadoop upgrade
Symptom:
After an upgrade, activity
on the cluster eventually
began to slow down and
the job queue
overflowed.

8
Example Application Pipeline - processing
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:
Flume,Kafka
Other systems:
httpd, sas,
custom apps
etc
Mahout
File Storage: HDFS
Batch processing
MapReduce
Map Reduce Job
generates new artifact
from HBase data and
writes to hdfs

8
Evidence:
Isolated to Processing phase (MR).
In NM Logs, found an innocuous but
anomalous log entry about “fetch
failures.”
Many users had run in to this MR
problem using different versions of
MR.
Workaround provided: remove the
problem node from the cluster.

8
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem

8
Root cause:
All MR versions had a
common dependency on
a particular version of
Jetty (Jetty 6.1.26).
Dev was able to
reproduce and fix the bug
in Jetty.

8
Case study 2: slow jobs after Linux upgrade
Symptom:
After an upgrade, system
CPU usage peaked at 30%
or more of the total CPU
usage.

9
Evidence:
Used tracing tools to
isolate a majority of time
was inexplicably spent in
virtual memory calls.
http://structureddata.org/2012/06/18/linux-6-
transparent-huge-pages-and-hadoop-workloads/

9
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem

9
Root cause:
Running RHEL or CentOS
versions 6.2, 6.3, and 6.4 or
SLES 11 SP2 has a feature called
"Transparent Huge Page (THP)"
compaction which interacts
poorly with Hadoop workloads.

9
Case study 3: slow jobs at a precise moment
Symptom:
High CPU usage and responsive
but sluggish cluster - even non-
Hadoop apps (e.g. MySQL).
30 customers all hit this at the
exact same time: 6/30/12 at
5pm PDT.

9
Evidence:
Checked the kernel
message buffer (run
dmesg) and look for
output confirming the
leap second injection.
Other systems had same
problem.

9
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem

9
Root cause:
Linux OS kernel
mishandled an
added leap
second.

9
Similar symptoms, different problem

9
• After an upgrade,
activity on the cluster
eventually began to
slow down and the job
queue overflowed.
In TT Logs, found an innocuous but
anomalous log entry about “fetch
failures.”
Many users had run in to this MR
problem using different versions of MR.
Workaround provided: remove the
problem node from the cluster.
All MR versions had a common
dependency on a particular
version of Jetty (Jetty 6.1.26) .
Dev was able to reproduce and
fix the bug in Jetty.
Symptom Evidence Root Cause

9
• After an upgrade,
system CPU usage
peaked at 30% or more
of the total CPU usage.
Perf tool which proved that
majority of time was inexplicably
spent in virtual memory calls.
Running RHEL or CentOS versions 6.2,
6.3, and 6.4 or SLES 11 SP2 has a feature
called "Transparent Huge Page (THP)"
compaction which interacts poorly with
Hadoop workloads.

1
• High CPU usage and responsive
but sluggish cluster.
• 30 customers all hit this at the
exact same time.
Checked the kernel message
buffer (run dmesg) and look for
output confirming the leap
second injection.
Linux OS kernel mishandled a leap second
added on 6/30/12 at 5pm PDT.

1
Lessons learned
More crucial than the
specific troubleshooting
methodology used is to
use one.
More crucial than the
specific tool used is the
type of data analyzed
and how it’s analyzed.
Capture for posterity in
a knowledge base
article, blog post, or
conference
presentation.
Methodology Tools Learn from failure

Enterprise Considerations
Jake Miller

1
• Security
• Authentication
• Confidentiality
• Uptime and availability
• Rolling upgrades
• Migration
• Backup and Disaster recovery
• Multi-tenancy
• Resource Management
• Auditing

1
• Security
• Authentication
• Confidentiality
• Migration
• Multi-tenancy
• Auditing

1
Security
• Safeguards
–Authentication
–Authorization
–Auditing
•Threats
• Data Corruption
• Encryption in transit
• Encryption at rest
Goal: Ensure only authorized, authenticated
users have access to data

1
–Authentication limit who can connect
• Kerberos - Strong Authentication
–MIT Kerberos, Active Directory
–LDAP for user management
–User obtains ticket used to identify oneself to the daemons
–Cloudera Manager helps configure
• Without Kerberos
–Uses Linux user of the client application
–Easy to spoof
Authentication – Proving Who You are

1
Authorization: Controlling user can access or do
• GRANT REVOKE on CRUD + Admin Permissions
• Each service provides different authorization mechanisms
• HDFS read / write / excute file permisisons
• YARN queues can be restricted to certain users
• HBase tables can be restricted to certain users (ACLs)
• Apache Sentry unifies all authorizations to a single point
• Up and coming
• RecordService – fine grained record authorization

1
Audit – tracking who did what
• Track which users performed particular operations
• Audit – who and when was something done?
• Cloudera Navigator

1
Confidentiality - information seen by intended users
• Securing communication channels within the cluster
• TLS
• Used to secure http interfaces
• Kerberos can be used to authenticate to these interfaces with SPNEGO
• SASL
• Used to secure data node data transfers
• Hadoop / Base RPC
• HDFS Encryption
• KeyTrustee
• Encrypted on disk

1
• Security
• Authentication
• Confidentiality
• Migration
• Multi-tenancy
• Auditing

1
Uptime and availability
• Unplanned
–Hardware failures
–Software errors
–Human error
•Planned
• Configuration Changes
• Upgrades
• Migrations
Goal: Handle failure events
with minimal operational impact

1
Batch processing
MapReduce
Languages/APIs: Hive, Pig, Crunch, Kite, Mahout
Unplanned down time?
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:Flume,
Kafka
DBImportExport:
Sqoop
File Storage: HDFS
Other systems:
httpd, sas,
custom apps
etcMachine
JVM
HW
Primary
Job In Mem processing:
Spark
Interactive SQL:
Impala
Search: Solr
User interface: HUE

1
Fault tolerance within the system
• Core systems are designed to sustain failures and failover automatically
• HDFS – 3x data replication; HDFS HA, 2NN, QJM/FC; Disk hot swap
• HBase – Multiple HMasters, Region Servers ; Read Replicas
• ZooKeeper – ZK peer Quorum
• YARN (MR, Spark, Impala) : Standby Resource Manager
• Security to mitigate human error

1
Planned down time
• Wire compatibility + Extensible Data formats
• Allow for forwards and backwards compatibility
• Older clients can talk to newer servers and visa versa
• Rolling upgrade
• Upgrade a single node at a time while system runs
• CM Orchestrates and manages this
• Allows API and changes while guaranteeing wire compatibility between different
minor versions
• CDH guarantees Client-server compatibility between Minor Versions

1
Snapshots: Recovering from Human mistakes
• Snapshot the state of directory/table at a certain moment in time
• Rename it later, creating a new read write directory
• Export it to another cluster
time
User Error:
rm -r
Service is restored,
minor data loss
Periodic
snapshot
Data is gone!
restore
Periodic
snapshot

1
External Risks

1
Batch processing
MapReduce
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:Flume,
Kafka
DBImportExport:
Sqoop
File Storage: HDFS
Other systems:
httpd, sas,
custom apps
etc
In Mem processing:
Spark
Interactive SQL:
Impala
Search: Solr
User interface: HUE

1
Geographically separated copies of data

1
Strategy: Supported Batch HDFS Backups
• Dist CP
• A MapReduce based utility
Distcp
Source

1
Strategy: Custom Application-managed Replication
• Application writes to two instances of HDFS
• Sources could be Kafka instances
• Adds complexity
• Inefficient
SourceSource

1
• Security
• Authentication
• Confidentiality
• Migration
• Multi-tenancy
• Auditing

1
Multitenancy
• Within a Cluster
–Resource Management
–Scale
•Across clusters
• Proxies
Goal: Serve multiple users, applications,
and datasets.

1
Multiple Workloads
• Multiple Workloads, many datasets on one cluster
• Resource Sharing
• Overprovision
• Static isolation if latency sensitive
• Multiple Workloads, shared dataset on one cluster
• Replicate to isolate if latency sensitive
• Future:
• Improving IO isolation for YARN/Slider/Mesos?

1
Resource Management
• Policies for resource sharing
• HDFS Quotas
• Yarn FairScheduler Pools
• HBase request throttling
• Hive / Impala Access Control with Sentry

1
Scale Considerations
• Multi Rack Deployments
• Specify HDFS rack topology configuration
• Limits failure domain to a single rack.
• Performance resiliency tradeoff
• Heterogeneous machine deploys
• When clusters get larger, recovery is more expensive.
• NN, CM machines should be beefier.

1
Gateway Proxies
Batch processing
MapReduce
Mahout
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:Flume,
Kafka
DBImportExport:
Sqoop
File Storage: HDFS
Other systems:
httpd, sas,
custom apps
etc
In Mem processing:
Spark
Interactive SQL:
Impala
Search: Solr
User interface: HUE

1
Gateway Proxies
• Deploy your cluster without exposing to all nodes to users
Machi
ne
Machi
ne
Machi
ne
Machi
ne
HUE
DMZ
Application
HBase
Proxy
HttpFS
(HDFS)
JDBC:
Impala/
Hive
UsersUsers
Search
Machi
ne
Machi
ne
Machi
ne
Machi
ne
DMZ HBase
Proxy
HttpFS
(HDFS)
JDBC:
Impala/
Hive
Search

1
Enterprise considerations
• Ensure only authorized,
authenticated
• users have access to
data
Handle failure events
with minimal operational
impact
Serve multiple users,
applications,
and datasets.
Security
Availability Multitenancy

1
Conclusions

1
Hadoop operations takeaways
• Cloudera has seen a lot of
diverse clusters and used that
experience to build tools to help
diagnose and understand how
Hadoop operates.
Similar symptoms can lead to
different root causes. Use tools to
assist with event correlation and
pattern determination.
Most enterprises require
strategies for Security,
Availability, Disaster recovery,
Maintenance, and Multi-tenancy.
Managing Clusters
Troubleshooting Enterprise Considerations
Kevin Jarrett https://flic.kr/p/cDKWvL

1
Questions?

Hadoop Operations

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Hadoop Operations

Similar to Hadoop Operations (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

Hadoop Operations