SlideShare a Scribd company logo
1 of 132
Apache Hadoop
Operations for
Production Systems
Greg Phillips, Jake Miller
Cloudera Technology Day
2© Cloudera, Inc. All rights reserved.
Your Hosts
Greg
Phillips
Jake
Miller
3© Cloudera, Inc. All rights reserved.
Greg Phillips
• Solutions Architect
• gphillips@clouderagovt
Jake Miller
• Customer Operations
Engineer
• jakem@cloudera
$ whoami
4© Cloudera, Inc. All rights reserved.
Agenda
• Intro
• Installation & Configuration
• Troubleshooting
• Enterprise Considerations
5© Cloudera, Inc. All rights reserved.
Machine
JVM
Linux
Disk, CPU, Mem
Machine
JVM
Linux
Disk, CPU, Mem
Machine
JVM
Linux
Disk, CPU, Mem
Machine
JVM
Linux
Disk, CPU, Mem
The Hadoop Stack: JVMs
6© Cloudera, Inc. All rights reserved.
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
The Hadoop Stack: Daemons
7© Cloudera, Inc. All rights reserved.
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
The Hadoop Stack: Storage + Coordination
Random Access Storage: HBase
File Storage: HDFS
Coordination:
ZooKeeper;
Security: Sentry
8© Cloudera, Inc. All rights reserved.
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
The Hadoop Stack: Execution
Random Access Storage: HBase
File Storage: HDFS
Coordination:
ZooKeeper;
Security: Sentry
In Mem processing:
Spark
Interactive SQL:
ImpalaBatch processing
MapReduce
Search: Solr
Resource Management: YARN
9© Cloudera, Inc. All rights reserved.
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
The Hadoop Stack: Languages + UI
Random Access Storage: HBase
File Storage: HDFS
Coordination:
ZooKeeper;
Security: Sentry
In Mem processing:
Spark
Interactive SQL:
ImpalaBatch processing
MapReduce
Search: Solr
Resource Management: YARN
Languages/APIs: Hive, Pig, Crunch, Kite,
Mahout
User interface: HUE
10© Cloudera, Inc. All rights reserved.
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
In Mem processing:
Spark
Interactive SQL:
ImpalaBatch processing
MapReduce
Resource Management: YARN
Coordination:
ZooKeeper;
Security: Sentry
Search: Solr
User interface: HUE
Random Access Storage: HBase
File Storage: HDFS
Languages/APIs: Hive, Pig, Crunch, Kite,
Mahout
The Hadoop Stack: Integration
Eventingest:
Flume,Kafka
DBImportExport:
Sqoop
Other
systems:
httpd, sas,
custom apps
etc
11© Cloudera, Inc. All rights reserved.
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
In Mem processing:
Spark
Interactive SQL:
ImpalaBatch processing
MapReduce
Resource Management: YARN
Coordination:
ZooKeeper;
Security: Sentry
Search: Solr
User interface: HUE
Random Access Storage: HBase
File Storage: HDFS
Languages/APIs: Hive, Pig, Crunch, Kite,
Mahout
The Hadoop Stack
Eventingest:
Flume,Kafka
DBImportExport:
Sqoop
Other
systems:
httpd, sas,
custom apps
etc
12© Cloudera, Inc. All rights reserved.
Apache Hadoop Operations
for Production Systems:
Installation
Greg Phillips
13© Cloudera, Inc. All rights reserved.
Hardware Considerations
• As a distributed system, Hadoop is going to be deployed onto multiple
interconnected hosts
• How large will the cluster be?
• What services will be deployed on the cluster?
• Can all services effectively run together on the same hosts or is some form of
physical partitioning required?
• What role will each host play in the cluster?
• This impacts the hardware profile (CPU, Memory, Storage, etc)
• How should the hosts be networked together?
14© Cloudera, Inc. All rights reserved.
Host Roles
within a
Cluster
Master Node
HDFS NameNode
YARN ResourceManager
HBase Master
Impala StateStore
ZooKeeper
Worker Node
HDFS DataNode
YARN NodeManager
HBase RegionServer
Impalad
Utility Node
Relational Database
Management (eg: CM)
Hive Metastore
Oozie
Impala Catalog Server
Edge Node
Gateway Configuration
Client Tools
Hue
HiveServer2
Ingest (eg: Flume)
For larger clusters, roles
will be spread across
multiple nodes of a
given type
15© Cloudera, Inc. All rights reserved.
Roles vs Cluster Size
Master Worker Utility Edge
Very Small
(≤10)
1 ≤10 1 shared Host
Small (≤20) 2 ≤20 1 shared Host
Medium (≤200) 3 ≤200 2 1+
Large (≤500) 5 ≤500 2 1+
16© Cloudera, Inc. All rights reserved.
Host Hardware Configuration
• CPU
• There’s no such thing as too much CPU
• Jobs typically do not saturate their cores, so raw clock speed is not at a
premium
• Cost and Budget are the major factors here
• Memory
• You really don’t want to overcommit and swap
• Java heaps should fit into physical RAM with additional space for OS and non-
Hadoop processes
17© Cloudera, Inc. All rights reserved.
Host Configuration (cont.)
• Disk
• More spindles == More I/O capacity
• Larger drives == lower cost per TB
• More hosts with less capacity increases parallelism and decreases re-replication
costs when replacing a host
• Fewer hosts with more capacity generally means lower unit cost
• Rule of thumb: One disk per two configured YARN vcores
• Lower latency disks are generally not a good investment, except for specific
use-cases where random I/O is important
18© Cloudera, Inc. All rights reserved.
Operating System Configuration
• Turn off IPTables (or any other firewall tool)
• Turn off SELinux
• Turn down swappiness to 10 or less
• Turn off Transparent Huge Page Compaction
• Use a network time source to keep all hosts in sync (also timezones!)
• Make sure forward and reverse DNS work on each host to resolve all other hosts
consistently
• Use the Oracle JDK - OpenJDK is subtly different and may lead you to grief
19© Cloudera, Inc. All rights reserved.
Sanity Testing
• Use the basic sanity tests provided by each service
• Submit example jobs: pi, sleep, teragen/terasort
• Work with sample tables in Hive/Impala
• Use Hue to do these things if your users will
• Repeat these tests when turning on security/authentication/authorization
mechanisms
• Make sure they succeed for the expected users and fail for others
• Cloudera Manager provides some ‘canary’ tests for certain services
• HDFS create/read/write/delete
• Hbase, Zookeeper, etc
20© Cloudera, Inc. All rights reserved.
Questions?
21© Cloudera, Inc. All rights reserved.
Apache Hadoop Operations
for Production Systems:
Configuration
Greg Phillips
22© Cloudera, Inc. All rights reserved.
Agenda
• Mechanics
• Key Configurations
• Configurations as Living Documents
23© Cloudera, Inc. All rights reserved.
What is Configuration?
• Everything to enable the cluster to function
• Individual Hadoop configuration files eg. hdfs-site.xml
• Where should processes run?
• Process configuration files, environment variables, command line arguments
• Which configuration changes require restart?
• What kind of restart?
24© Cloudera, Inc. All rights reserved.
Why use a Tool?
• Require a meta configuration language
• How do you describe where processes should run?
• Lots of configurations to manage
• Scoping of configurations
• Some configurations apply globally, others locally
25© Cloudera, Inc. All rights reserved.
Hadoop XML
• Example: /etc/hadoop/conf/hdfs-site.xml
• Configuration inheritance
• Key value pairs representation
• Many many of these files!
<configuration>
<property>
<name>dfs.client.mmap.cache.size</name>
<value>256</value>
</property>
</configuration>
26© Cloudera, Inc. All rights reserved.
Editing a Configuration Demo!
27© Cloudera, Inc. All rights reserved.
If Demo Failed…
Step One: edit a config
Step Two: save
28© Cloudera, Inc. All rights reserved.
If Demo Failed…
Step Three: at top-level, note that restart is needed
Step Four: review changes
Step five: restart
29© Cloudera, Inc. All rights reserved.
Verifying Configuration
• <namenode_host>:50070/conf
30© Cloudera, Inc. All rights reserved.
Managing Configuration Files
• Management tool will put configurations somewhere
• Eg. core-site.xml, hdfs-site.xml, dfs_hosts_allow, dfs_hosts_exclude, hbase-
site.xml, hive-env.sh, hive-site.xml, hue.ini, mapred-site.xml, oozie-site.xml, yarn-
site.xml, zoo.cfg (and so on)
• Often they are deployed in private locations
• Good idea for files to be immutable
31© Cloudera, Inc. All rights reserved.
Managing Configuration Files Demo!
32© Cloudera, Inc. All rights reserved.
If Demo Failed…
• e.g., /var/run/cloudera-scm-agent/process/*-NAMENODE
33© Cloudera, Inc. All rights reserved.
Key Static Configuration
• Security
• Ports
• Local Storage
• JVM
• Databases
34© Cloudera, Inc. All rights reserved.
Networking
• Does your DNS work?
• Like, forwards and backwards?
• For sure?
• Have you really checked?
35© Cloudera, Inc. All rights reserved.
HDFS Configurations of Note
• Local Data Directories
• dfs.datanode.data.dir: one per spindle; avoid RAID
• dfs.namenode.name.dir: two copies of your metadata better than one
• High Availability
• Requires more daemons
• Two failover controllers co-located with namenodes
• Three journal nodes
36© Cloudera, Inc. All rights reserved.
Resource Management in YARN
• We want to prefer one multi tenant cluster over multiple clusters
• Marketing vs. Engineering cluster resources
• The “Fair Scheduler” implements the Dominant Resource Fairness Algorithm
• Configured by an “allocations.xml” file
• Possible to police resource usage through cgroups
37© Cloudera, Inc. All rights reserved.
Yarn Configurations of Note
• Yarn doles out resources to applications across two axes: memory and CPU
• Define per-NodeManager resources
• yarn.nodemanager.resource.cpu-vcores
• yarn.nodemanager.resource.memory-mb
• Some guardrails:
• yarn.scheduler.maximum-allocation-vcores
• yarn.scheduler.maximum-allocation-mb
38© Cloudera, Inc. All rights reserved.
Yarn Configurations of Note
• Can any other job fit?
1GB
1GB
1GB
1GB
1
core
1
core
1
core
1
core
1
core
1
core
🍅 1 GB, 1 core
🎃 2 GB, 2 cores
🍪 1 GB, 2 cores
🍅
🍪 🍪
🎃
39© Cloudera, Inc. All rights reserved.
Configurations as Living Documents
• Configuration is sometimes workload dependent
• Leverage your metric system
Configs Metrics
40© Cloudera, Inc. All rights reserved.
Heap Sizing
• Heap sizes:
• Datanodes: linear in # of blocks
• Namenode: linear in # of blocks and # of files
41© Cloudera, Inc. All rights reserved.
Resource management
• Fannie and Freddie are sharing the cluster
• How do we configure the weights?
42© Cloudera, Inc. All rights reserved.
In Summary
• Book keeping of cluster configuration requires a tool
• Make sure key configurations are set properly
• Configuration files are living documents – use your metric system!
43© Cloudera, Inc. All rights reserved.
4
Apache Hadoop Operations
for Production Systems:
Troubleshooting
Jake Miller
44© Cloudera, Inc. All rights reserved.
4
Troubleshooting
Managing Hadoop Clusters
Troubleshooting Hadoop Systems
Debugging Hadoop Applications
45© Cloudera, Inc. All rights reserved.
4
Troubleshooting
Managing Hadoop Clusters
Troubleshooting Hadoop Systems
Debugging Hadoop Applications
46© Cloudera, Inc. All rights reserved.
4
47© Cloudera, Inc. All rights reserved.
4
Other
systems:
httpd, sas,
custom apps
etc
Tools for lots of machines
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
48© Cloudera, Inc. All rights reserved.
4
Other
systems:
httpd, sas,
custom apps
etc
Tools for lots of machines
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Ganglia, Nagios
Ganglia*, Nagios*
49© Cloudera, Inc. All rights reserved.
4
Ganglia: Monitoring
• Collects and aggregates metrics from machines
• Good for what’s going on right now and describing normal perf
• Organized by physical resource (CPU, Mem, Disk)
– Good defaults
– Good for pin pointing machines
– Good for seeing overall utilization
– Uses RRDTool under the covers
• Some data scalability limitations, lossy over time.
• Dynamic for new machines, requires config for new metrics
50© Cloudera, Inc. All rights reserved.
5
Graphite: Monitoring
• Popular alternative to Ganglia
• Can handle the scale of metrics coming
in
• Similar to Ganglia, but uses its own RRD
database.
• More aimed at dynamic metrics (as
opposed to statically defined metrics)
51© Cloudera, Inc. All rights reserved.
5
Nagios: Alerting
•Provides alerts from canaries and basic health checks for services on
machines
•Organized by Service (e.g. HTTP, POP, SSH, SMTP)
•Defacto standard for service monitoring
•Lacks distributed system know how
•Requires bespoke setup for service slaves and masters
•Lacks details with multi-tenant services or short-lived jobs
52© Cloudera, Inc. All rights reserved.
5
Tools for the Hadoop Stack
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
In Mem
processing: Spark
Interactive SQL:
ImpalaBatch processing
MapReduce
Resource Management: YARN
Coordination:
zookeeper;
Security: Sentry
Search: Solr
Eventingest:
Flume,Kafka
DBImportExport:
Sqoop
User interface: HUE
Other systems:
httpd, sas,
custom apps
etc
Random Access Storage: HBase
File Storage: HDFS
Languages/APIs: Hive, Pig, Crunch, Kite,
Mahout
Ganglia, Nagios
Ganglia*, Nagios*
53© Cloudera, Inc. All rights reserved.
5
Tools for the Hadoop Stack
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
In Mem
processing:
Spark
Interactive SQL:
ImpalaBatch
processing
MapReduce
Resource Management: YARN
Coordination:
zookeeper;
Security: Sentry
Search: Solr
Eventingest:
Flume,Kafka
DBImport
Export:Sqoop
User interface: HUE
Other
systems:
httpd, sas,
custom apps
etc
Random Access Storage: HBase
File Storage: HDFS
Languages/APIs: Hive, Pig, Crunch,
Kite, Mahout
Logs and Metrics
Logs and
Metrics
Logs and
Metrics
Logs and
Metrics
Logs and
Metrics
Logs and Metrics
Logs
Logs and
Metrics
Logsand
Metrics
Logs
Logs and Metrics
Logs and Metrics
HTrace
HTrace
Custom
app logs
/proc, dmesg, /var/logs/
jmx, jstack, jstat, GC logs
54© Cloudera, Inc. All rights reserved.
5
Tools for the Hadoop Stack
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
In Mem
processing:
Spark
Interactive SQL:
ImpalaBatch
processing
MapReduce
Resource Management: YARN
Coordination:
zookeeper;
Security: Sentry
Search: Solr
Eventingest:
Flume,Kafka
DBImport
Export:Sqoop
User interface: HUE
Other
systems:
httpd, sas,
custom apps
etc
Random Access Storage: HBase
File Storage: HDFS
Languages/APIs: Hive, Pig, Crunch,
Kite, Mahout
Ganglia, Nagios
Ganglia*, Nagios*
Ganglia*, Nagios* ?
55© Cloudera, Inc. All rights reserved.
5
Ganglia and Nagios are not enough
• Fault tolerant distributed system masks many problems (by design!)
– Some failures are not critical – failure condition more complicated
• Lacks distributed system know how
– Requires bespoke setup for service slaves and masters
– Lacks details with multi-tenant services or short-lived jobs
• Hadoop services are logically dependent on each other
– Need to correlate metrics across different service and machines
– Need to correlate logs from different services and machines
– What about all the logs?
– What of all these metrics do we really need?
• Some data scalability limitations, lossy over time
– What about full fidelity?
56© Cloudera, Inc. All rights reserved.
5
OpenTSDB
• OpenTSDB (Time Series Database)
– Scalable storage to keep all your metrics
– Efficiently stores metric data into HBase
– Keeps data at full fidelity
– Keep as much data as your HBase instance
can handle.
• Free and Open Source
57© Cloudera, Inc. All rights reserved.
5
Cloudera Manager
• Extracts hardware, OS, and Hadoop service metrics
specifically relevant to Hadoop and its related services.
– Operation Latencies
– # of disk seeks
– HDFS data written
– Java GC time
– Network IO
• Provides
– Cluster preflight checks
– Basic host checks
– Regular health checks
• Uses LevelDB for underlying metrics storage
• Provides distributed log search
• Monitors logs for known issues
• Point and click for useful utils (lsof, jstack, jmap)
• Free
58© Cloudera, Inc. All rights reserved.
5
Tools for the Hadoop Stack
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
In Mem
processing:
Spark
Interactive SQL:
ImpalaBatch
processing
MapReduce
Resource Management: YARN
Coordination:
zookeeper;
Security: Sentry
Search: Solr
Eventingest:
Flume,Kafka
DBImport
Export:Sqoop
User interface: HUE
Other
systems:
httpd, sas,
custom apps
etc
Random Access Storage: HBase
File Storage: HDFS
Languages/APIs: Hive, Pig, Crunch,
Kite, Mahout
Ganglia, Nagios
Ganglia*, Nagios*
Full Fidelity metrics:
Open TSDB*Metrics + Logs:
Cloudera Manager
Logs:
Search a la Splunk or
Rocana
59© Cloudera, Inc. All rights reserved.
5
Troubleshooting
Managing a Hadoop Clusters
Troubleshooting Hadoop Systems
Debugging Hadoop Applications
60© Cloudera, Inc. All rights reserved.
6
The Law of Cluster Inertia
A cluster in a good state stays in a good state,
and
a cluster in a bad state stays in a bad state,
unless
acted upon by an external force.
61© Cloudera, Inc. All rights reserved.
6
Admins and Users as an external force
Upgrades
• Linux
• Hadoop
• Java
Misconfiguration
• Memory Mismanagement
– JVM vs Container OOME
– NN OOME
• Thread Mismanagement
– Fetch Failures
– Replicas
• Disk Mismanagement
– No File
– Too Many Files
62© Cloudera, Inc. All rights reserved.
6
Understanding Normal
• Establish normal
– Boring logs are good
• So that you can detect the abnormal
– Find anomalies and outliers
– Compare performance
• And then isolate root causes
– Who are the suspects
– Pull the thread by interrogating
– Correlate across different subsystems
63© Cloudera, Inc. All rights reserved.
6
Basic tools
• What happened?
– Logs
• What state are we in now?
– Metrics
– Thread stacks
• Is it alive?
– listings
– Canaries
• Is it OK?
– fsck / hbck
• What happens when I do this?
– Tracing
– Dumps
64© Cloudera, Inc. All rights reserved.
6
Diagnostics for single machines
Machine
Linux
Disk, CPU, Mem
Analysis and Tools via @brendangregg
65© Cloudera, Inc. All rights reserved.
6
Diagnosis Tools
HW/Kernel
Logs /log, /var/log
dmesg
Metrics /proc, top, iotop,
sar, vmstat, netstat
Tracing Strace, tcpdump
Dumps Core dumps
Liveness ping
Corrupt fsck
66© Cloudera, Inc. All rights reserved.
6
Diagnostics for the JVM
• Most Hadoop services run in the Java VM, intros new java subsystems
– Just in time compiler
– Threads
– Garbage collection
• Dumping Current threads
– jstacks (any threads stuck or blocked?)
• JVM Settings:
– Enabling GC Logs: -XX:+PrintGCDateStamps -XX:+PrintGCDetails
– Enabling OOME Heap dumps: -XX:+HeapDumpOnOutOfMemoryError
• Metrics on GC
– Get gc counts and info with jstat
• Memory Usage dumps
– Create heap dump: jmap –dump:live,format=b,file=heap.bin <pid>
– View heap dump from crash or jmap with jhat
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
67© Cloudera, Inc. All rights reserved.
6
GC Logs Are Your Friends
• GC Logging
– -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
-Xloggc:/var/log/hdfs/nn-hdfs.log
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=20M
• Great resource:
– https://stackoverflow.com/questions/895444/java-garbage-collection-log-messages
68© Cloudera, Inc. All rights reserved.
6
• jstack
– Use the same JDK
– Must be run as user of the
process
– Can force it with –F but that
abstracts away deadlock info
• kill -3 <PID>
– Dumps jstack to stdout
CM stores as separate logfiles
JVM Debugging Techniques: Hung Process
69© Cloudera, Inc. All rights reserved.
6
JVM Debugging Techniques: Hung Process
70© Cloudera, Inc. All rights reserved.
7
JVM Debugging Techniques: Heap Analysis
• jstat -gcutil <PID> 1s 120
– Checks for current GC activity
• jmap -histo:live <PID>
– Get a histogram of the current objects within the heap
71© Cloudera, Inc. All rights reserved.
7
Diagnosis Tools
HW/Kernel JVM
Logs /log, /var/log
dmesg
Enable gc logging
Metrics /proc, top, iotop, sar,
vmstat, netstat
jstat
Tracing Strace, tcpdump Debugger, jdb, jstack
Dumps Core dumps jmap
Enable OOME dump
Liveness ping Jps
Corrupt fsck
72© Cloudera, Inc. All rights reserved.
7
Diagnostics for the Hadoop Stack
• Single client call can trigger many RPCs spanning many machines
• Systems are evolving quickly
• A failure on one daemon, by design, does not cause failure of the entire service
• Logs:
• Each service’s master and slaves have their own logs: /var/log/
• There are lot of logs and they change frequently
• Metrics:
• Each daemon offers metrics, often aggregated at masters
• The hard part is figuring out which metrics matter to you
• Tracing:
• HTrace (integrated into Hadoop, HBase; currently in Apache Incubator)
• What strace did for system calls, HTrace does for RPC calls
• Liveness:
• Canaries, service/daemon web UIs
73© Cloudera, Inc. All rights reserved.
7
Hadoop Logs Are Your Friends
• Logs are verbose but necessary
– NameNode logs (500 node cluster):
• 10 files * 200MB/file = 2GB
• 2GB of logs span 3 hours
• 3 days of logs = ~48GB
• Retain enough logs for debugging & plan for worst case
– Adjust log retention as the cluster grows
– To properly analyze, you need to log enough
– You may only get one shot
74© Cloudera, Inc. All rights reserved.
7
• Just reduce the log level to save space? NO!
– INFO logging helps us understand what happened right before
failure
• E.g. who logged in, what job was running, what they were writing
to
• Application: Yarn containers write logs locally, then migrate to HDFS
– Ensure application log space is sufficient
– It’s a distributed system but make sure local system is configured
correctly
INFO Logs Are Your Friends
75© Cloudera, Inc. All rights reserved.
7
Hadoop Debugging Techniques: LogLevel
• Set the log level without process restarts
http://namenode.cloudera.com:50070/logLevel
• Scriptable Java servlet
http://namenode.cloudera.com:50070/logLevel?log=org&level=DEBUG
76© Cloudera, Inc. All rights reserved.
7
Metrics: HDFS
• HDFS is the core of the platform.
What’s important?
– Is the Standby NN
checkpointing?
– Are the NNs garbage
collecting?
– Percentage of heap used at
steady state?
Best to checkpoint every hour
77© Cloudera, Inc. All rights reserved.
7
Diagnosis Tools
HW/Kernel JVM Hadoop Stack
Logs /log, /var/log
dmesg
enable gc logging *:/var/log/hadoop|hbase|…
Metrics /proc, top, iotop, sar,
vmstat, netstat
jstat *:/stacks, *:/jmx,
Tracing strace, tcpdump debugger, jdb, jstack htrace
Dumps core dumps jmap
enable OOME dump
*:/dump
Liveness ping Jps Web UI
Corrupt fsck HDFS fsck, HBase hbck
78© Cloudera, Inc. All rights reserved.
7
Finding the right part of the stack
E-SPORE (from Eric Sammer’s Hadoop Operations)
• Environment
– What is different about the environment now from the last time everything worked?
• Stack
– The entire cluster also has shared dependency on data center infrastructure such as the network, DNS, and other services.
• Patterns
– Are the tasks from the same job? Are they all assigned to the same tasktracker? Do they all use a shared library that was changed
recently?
• Output
– Always check log output for exceptions but don’t assume the symptom correlates to the root cause.
• Resources
– Do local disks have enough? Is the machine swapping? Does the network utilization look normal? Does the CPU utilization look
normal?
• Event correlation
– It’s important to know the order in which the events led to the failure.
79© Cloudera, Inc. All rights reserved.
7
Troubleshooting
Managing Hadoop Clusters
Troubleshooting Hadoop Systems
Debugging Hadoop Applications
80© Cloudera, Inc. All rights reserved.
8
Example application pipeline with strict SLAs
ingest process export
ingest process export
ingest process export
time
81© Cloudera, Inc. All rights reserved.
8
Example Application Pipeline - Ingest
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: SentryFile Storage: HDFS
Custom app feeds
event data into HBase
Random Access Storage: HBase
Other
systems:
httpd, sas,
custom apps
etc
Eventingest:
Flume,Kafka
82© Cloudera, Inc. All rights reserved.
8
Example Application Pipeline - processing
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop
Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:
Flume,Kafka
Other
systems:
httpd, sas,
custom apps
etc
Batch
processing
MapReduce
Resource Management: YARN
Languages/APIs: Hive, Pig, Crunch,
Kite, Mahout
Random Access Storage: HBase
Map Reduce Job
generates new artifact
from HBase data and
writes to HDFS
File Storage: HDFS
83© Cloudera, Inc. All rights reserved.
8
Batch processing
MapReduce
Languages/APIs: Hive, Pig, Crunch, Kite,
Mahout
Example Application Pipeline - export
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:
Flume,Kafka
Random Access Storage: HBase 9
Resource Management: YARN
DBImportExport:
Sqoop
File Storage: HDFS Other systems:
httpd, sas,
custom apps
etc
Sqoop result into
relational database
to serve application
84© Cloudera, Inc. All rights reserved.
8
Case study 1: slow jobs after Hadoop upgrade
Symptom:
After an upgrade, activity
on the cluster eventually
began to slow down and
the job queue
overflowed.
85© Cloudera, Inc. All rights reserved.
8
Example Application Pipeline - processing
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:
Flume,Kafka
Other systems:
httpd, sas,
custom apps
etc
Languages/APIs: Hive, Pig, Crunch, Kite,
Mahout
Random Access Storage: HBase
File Storage: HDFS
Batch processing
MapReduce
Resource Management: YARN
Map Reduce Job
generates new artifact
from HBase data and
writes to hdfs
86© Cloudera, Inc. All rights reserved.
8
Case study 1: slow jobs after Hadoop upgrade
Evidence:
Isolated to Processing phase (MR).
In NM Logs, found an innocuous but
anomalous log entry about “fetch
failures.”
Many users had run in to this MR
problem using different versions of
MR.
Workaround provided: remove the
problem node from the cluster.
87© Cloudera, Inc. All rights reserved.
8
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
88© Cloudera, Inc. All rights reserved.
8
Case study 1: slow jobs after Hadoop upgrade
Root cause:
All MR versions had a
common dependency on
a particular version of
Jetty (Jetty 6.1.26).
Dev was able to
reproduce and fix the bug
in Jetty.
89© Cloudera, Inc. All rights reserved.
8
Case study 2: slow jobs after Linux upgrade
Symptom:
After an upgrade, system
CPU usage peaked at 30%
or more of the total CPU
usage.
90© Cloudera, Inc. All rights reserved.
9
Case study 2: slow jobs after Linux upgrade
Evidence:
Used tracing tools to
isolate a majority of time
was inexplicably spent in
virtual memory calls.
http://structureddata.org/2012/06/18/linux-6-
transparent-huge-pages-and-hadoop-workloads/
91© Cloudera, Inc. All rights reserved.
9
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
92© Cloudera, Inc. All rights reserved.
9
Case study 2: slow jobs after Linux upgrade
Root cause:
Running RHEL or CentOS
versions 6.2, 6.3, and 6.4 or
SLES 11 SP2 has a feature called
"Transparent Huge Page (THP)"
compaction which interacts
poorly with Hadoop workloads.
93© Cloudera, Inc. All rights reserved.
9
Case study 3: slow jobs at a precise moment
Symptom:
High CPU usage and responsive
but sluggish cluster - even non-
Hadoop apps (e.g. MySQL).
30 customers all hit this at the
exact same time: 6/30/12 at
5pm PDT.
94© Cloudera, Inc. All rights reserved.
9
Case study 3: slow jobs at a precise moment
Evidence:
Checked the kernel
message buffer (run
dmesg) and look for
output confirming the
leap second injection.
Other systems had same
problem.
95© Cloudera, Inc. All rights reserved.
9
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
96© Cloudera, Inc. All rights reserved.
9
Case study 3: slow jobs at a precise moment
Root cause:
Linux OS kernel
mishandled an
added leap
second.
97© Cloudera, Inc. All rights reserved.
9
Similar symptoms, different problem
98© Cloudera, Inc. All rights reserved.
9
Case study 1: slow jobs after Hadoop upgrade
• After an upgrade,
activity on the cluster
eventually began to
slow down and the job
queue overflowed.
In TT Logs, found an innocuous but
anomalous log entry about “fetch
failures.”
Many users had run in to this MR
problem using different versions of MR.
Workaround provided: remove the
problem node from the cluster.
All MR versions had a common
dependency on a particular
version of Jetty (Jetty 6.1.26) .
Dev was able to reproduce and
fix the bug in Jetty.
Symptom Evidence Root Cause
99© Cloudera, Inc. All rights reserved.
9
Case study 2: slow jobs after Linux upgrade
• After an upgrade,
system CPU usage
peaked at 30% or more
of the total CPU usage.
Perf tool which proved that
majority of time was inexplicably
spent in virtual memory calls.
Running RHEL or CentOS versions 6.2,
6.3, and 6.4 or SLES 11 SP2 has a feature
called "Transparent Huge Page (THP)"
compaction which interacts poorly with
Hadoop workloads.
Symptom Evidence Root Cause
100© Cloudera, Inc. All rights reserved.
1
Case study 3: slow jobs at a precise moment
• High CPU usage and responsive
but sluggish cluster.
• 30 customers all hit this at the
exact same time.
Checked the kernel message
buffer (run dmesg) and look for
output confirming the leap
second injection.
Linux OS kernel mishandled a leap second
added on 6/30/12 at 5pm PDT.
Symptom Evidence Root Cause
101© Cloudera, Inc. All rights reserved.
1
Lessons learned
More crucial than the
specific troubleshooting
methodology used is to
use one.
More crucial than the
specific tool used is the
type of data analyzed
and how it’s analyzed.
Capture for posterity in
a knowledge base
article, blog post, or
conference
presentation.
Methodology Tools Learn from failure
Apache Hadoop Operations
for Production Systems:
Enterprise Considerations
Jake Miller
103© Cloudera, Inc. All rights reserved.
1
Enterprise Considerations
• Security
• Authentication
• Confidentiality
• Uptime and availability
• Rolling upgrades
• Migration
• Backup and Disaster recovery
• Multi-tenancy
• Resource Management
• Auditing
104© Cloudera, Inc. All rights reserved.
1
Enterprise Considerations
• Security
• Authentication
• Confidentiality
• Uptime and availability
• Rolling upgrades
• Migration
• Backup and Disaster recovery
• Multi-tenancy
• Resource Management
• Auditing
105© Cloudera, Inc. All rights reserved.
1
Security
• Safeguards
–Authentication
–Authorization
–Auditing
•Threats
• Data Corruption
• Encryption in transit
• Encryption at rest
Goal: Ensure only authorized, authenticated
users have access to data
106© Cloudera, Inc. All rights reserved.
1
–Authentication limit who can connect
• Kerberos - Strong Authentication
–MIT Kerberos, Active Directory
–LDAP for user management
–User obtains ticket used to identify oneself to the daemons
–Cloudera Manager helps configure
• Without Kerberos
–Uses Linux user of the client application
–Easy to spoof
Authentication – Proving Who You are
107© Cloudera, Inc. All rights reserved.
1
Authorization: Controlling user can access or do
• GRANT REVOKE on CRUD + Admin Permissions
• Each service provides different authorization mechanisms
• HDFS read / write / excute file permisisons
• YARN queues can be restricted to certain users
• HBase tables can be restricted to certain users (ACLs)
• Apache Sentry unifies all authorizations to a single point
• Up and coming
• RecordService – fine grained record authorization
108© Cloudera, Inc. All rights reserved.
1
Audit – tracking who did what
• Track which users performed particular operations
• Audit – who and when was something done?
• Cloudera Navigator
109© Cloudera, Inc. All rights reserved.
1
Confidentiality - information seen by intended users
• Securing communication channels within the cluster
• TLS
• Used to secure http interfaces
• Kerberos can be used to authenticate to these interfaces with SPNEGO
• SASL
• Used to secure data node data transfers
• Hadoop / Base RPC
• HDFS Encryption
• KeyTrustee
• Encrypted on disk
110© Cloudera, Inc. All rights reserved.
1
Enterprise Considerations
• Security
• Authentication
• Confidentiality
• Uptime and availability
• Rolling upgrades
• Migration
• Backup and Disaster recovery
• Multi-tenancy
• Resource Management
• Auditing
111© Cloudera, Inc. All rights reserved.
1
Uptime and availability
• Unplanned
–Hardware failures
–Software errors
–Human error
•Planned
• Configuration Changes
• Upgrades
• Migrations
Goal: Handle failure events
with minimal operational impact
112© Cloudera, Inc. All rights reserved.
1
Batch processing
MapReduce
Languages/APIs: Hive, Pig, Crunch, Kite, Mahout
Unplanned down time?
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:Flume,
Kafka
Random Access Storage: HBase
Resource Management: YARN
DBImportExport:
Sqoop
File Storage: HDFS
Other systems:
httpd, sas,
custom apps
etcMachine
JVM
HW
Primary
Job In Mem processing:
Spark
Interactive SQL:
Impala
Search: Solr
User interface: HUE
113© Cloudera, Inc. All rights reserved.
1
Fault tolerance within the system
• Core systems are designed to sustain failures and failover automatically
• HDFS – 3x data replication; HDFS HA, 2NN, QJM/FC; Disk hot swap
• HBase – Multiple HMasters, Region Servers ; Read Replicas
• ZooKeeper – ZK peer Quorum
• YARN (MR, Spark, Impala) : Standby Resource Manager
• Security to mitigate human error
114© Cloudera, Inc. All rights reserved.
1
Planned down time
• Wire compatibility + Extensible Data formats
• Allow for forwards and backwards compatibility
• Older clients can talk to newer servers and visa versa
• Rolling upgrade
• Upgrade a single node at a time while system runs
• CM Orchestrates and manages this
• Allows API and changes while guaranteeing wire compatibility between different
minor versions
• CDH guarantees Client-server compatibility between Minor Versions
115© Cloudera, Inc. All rights reserved.
1
Snapshots: Recovering from Human mistakes
• Snapshot the state of directory/table at a certain moment in time
• Rename it later, creating a new read write directory
• Export it to another cluster
time
User Error:
rm -r
Service is restored,
minor data loss
Periodic
snapshot
Data is gone!
restore
Periodic
snapshot
116© Cloudera, Inc. All rights reserved.
1
External Risks
117© Cloudera, Inc. All rights reserved.
1
Batch processing
MapReduce
Languages/APIs: Hive, Pig, Crunch, Kite, Mahout
Unplanned down time?
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:Flume,
Kafka
Random Access Storage: HBase
Resource Management: YARN
DBImportExport:
Sqoop
File Storage: HDFS
Other systems:
httpd, sas,
custom apps
etc
In Mem processing:
Spark
Interactive SQL:
Impala
Search: Solr
User interface: HUE
118© Cloudera, Inc. All rights reserved.
1
Batch processing
MapReduce
Languages/APIs: Hive, Pig, Crunch, Kite, Mahout
Unplanned down time?
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:Flume,
Kafka
Random Access Storage: HBase
Resource Management: YARN
DBImportExport:
Sqoop
File Storage: HDFS
Other systems:
httpd, sas,
custom apps
etc
In Mem processing:
Spark
Interactive SQL:
Impala
Search: Solr
User interface: HUE
119© Cloudera, Inc. All rights reserved.
1
Geographically separated copies of data
120© Cloudera, Inc. All rights reserved.
1
Strategy: Supported Batch HDFS Backups
• Dist CP
• A MapReduce based utility
Distcp
Source
121© Cloudera, Inc. All rights reserved.
1
Strategy: Custom Application-managed Replication
• Application writes to two instances of HDFS
• Sources could be Kafka instances
• Adds complexity
• Inefficient
SourceSource
122© Cloudera, Inc. All rights reserved.
1
Enterprise Considerations
• Security
• Authentication
• Confidentiality
• Uptime and availability
• Rolling upgrades
• Migration
• Backup and Disaster recovery
• Multi-tenancy
• Resource Management
• Auditing
123© Cloudera, Inc. All rights reserved.
1
Multitenancy
• Within a Cluster
–Resource Management
–Scale
•Across clusters
• Proxies
Goal: Serve multiple users, applications,
and datasets.
124© Cloudera, Inc. All rights reserved.
1
Multiple Workloads
• Multiple Workloads, many datasets on one cluster
• Resource Sharing
• Overprovision
• Static isolation if latency sensitive
• Multiple Workloads, shared dataset on one cluster
• Replicate to isolate if latency sensitive
• Future:
• Improving IO isolation for YARN/Slider/Mesos?
125© Cloudera, Inc. All rights reserved.
1
Resource Management
• Policies for resource sharing
• HDFS Quotas
• Yarn FairScheduler Pools
• HBase request throttling
• Hive / Impala Access Control with Sentry
126© Cloudera, Inc. All rights reserved.
1
Scale Considerations
• Multi Rack Deployments
• Specify HDFS rack topology configuration
• Limits failure domain to a single rack.
• Performance resiliency tradeoff
• Heterogeneous machine deploys
• When clusters get larger, recovery is more expensive.
• NN, CM machines should be beefier.
127© Cloudera, Inc. All rights reserved.
1
Gateway Proxies
Batch processing
MapReduce
Languages/APIs: Hive, Pig, Crunch, Kite,
Mahout
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Machine
JVM
Linux
Hadoop Daemons
Disk, CPU, Mem
Coordination:
zookeeper;
Security: Sentry
Eventingest:Flume,
Kafka
Random Access Storage: HBase
Resource Management: YARN
DBImportExport:
Sqoop
File Storage: HDFS
Other systems:
httpd, sas,
custom apps
etc
In Mem processing:
Spark
Interactive SQL:
Impala
Search: Solr
User interface: HUE
128© Cloudera, Inc. All rights reserved.
1
Gateway Proxies
• Deploy your cluster without exposing to all nodes to users
Machi
ne
Machi
ne
Machi
ne
Machi
ne
HUE
DMZ
Application
HBase
Proxy
HttpFS
(HDFS)
JDBC:
Impala/
Hive
UsersUsers
Search
Machi
ne
Machi
ne
Machi
ne
Machi
ne
DMZ HBase
Proxy
HttpFS
(HDFS)
JDBC:
Impala/
Hive
Search
129© Cloudera, Inc. All rights reserved.
1
Enterprise considerations
• Ensure only authorized,
authenticated
• users have access to
data
Handle failure events
with minimal operational
impact
Serve multiple users,
applications,
and datasets.
Security
Availability Multitenancy
130© Cloudera, Inc. All rights reserved.
1
Conclusions
131© Cloudera, Inc. All rights reserved.
1
Hadoop operations takeaways
• Cloudera has seen a lot of
diverse clusters and used that
experience to build tools to help
diagnose and understand how
Hadoop operates.
Similar symptoms can lead to
different root causes. Use tools to
assist with event correlation and
pattern determination.
Most enterprises require
strategies for Security,
Availability, Disaster recovery,
Maintenance, and Multi-tenancy.
Managing Clusters
Troubleshooting Enterprise Considerations
Kevin Jarrett https://flic.kr/p/cDKWvL
132© Cloudera, Inc. All rights reserved.
1
Questions?

More Related Content

What's hot

Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopDataWorks Summit
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudCloudera, Inc.
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem tableMohamed Magdy
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Cloudera, Inc.
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? DataWorks Summit
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessCloudera, Inc.
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedInDataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSpark Summit
 
Data Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentData Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentDataWorks Summit
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 

What's hot (20)

Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem table
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
 
Kudu Cloudera Meetup Paris
Kudu Cloudera Meetup ParisKudu Cloudera Meetup Paris
Kudu Cloudera Meetup Paris
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedIn
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Data Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentData Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake Environment
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 

Viewers also liked

Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Cloudera, Inc.
 
Collection of Small Tips on Further Stabilizing your Hadoop Cluster
Collection of Small Tips on Further Stabilizing your Hadoop ClusterCollection of Small Tips on Further Stabilizing your Hadoop Cluster
Collection of Small Tips on Further Stabilizing your Hadoop ClusterDataWorks Summit
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector Yahoo Developer Network
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...Yahoo Developer Network
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieYahoo Developer Network
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
 

Viewers also liked (6)

Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
 
Collection of Small Tips on Further Stabilizing your Hadoop Cluster
Collection of Small Tips on Further Stabilizing your Hadoop ClusterCollection of Small Tips on Further Stabilizing your Hadoop Cluster
Collection of Small Tips on Further Stabilizing your Hadoop Cluster
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 

Similar to Hadoop Operations

Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationAlex Moundalexis
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopfann wu
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuiteEDB
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoopjdcryans
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Deployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS LinuxDeployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS LinuxWO Community
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataMike Percy
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersOwen O'Malley
 
Xap memory xtend-tutorial-2014
Xap memory xtend-tutorial-2014Xap memory xtend-tutorial-2014
Xap memory xtend-tutorial-2014Shay Hassidim
 
Building an Apache Hadoop data application
Building an Apache Hadoop data applicationBuilding an Apache Hadoop data application
Building an Apache Hadoop data applicationtomwhite
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 

Similar to Hadoop Operations (20)

Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Deployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS LinuxDeployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS Linux
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
Xap memory xtend-tutorial-2014
Xap memory xtend-tutorial-2014Xap memory xtend-tutorial-2014
Xap memory xtend-tutorial-2014
 
Building an Apache Hadoop data application
Building an Apache Hadoop data applicationBuilding an Apache Hadoop data application
Building an Apache Hadoop data application
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durbanmasabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 

Recently uploaded (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 

Hadoop Operations

  • 1. Apache Hadoop Operations for Production Systems Greg Phillips, Jake Miller Cloudera Technology Day
  • 2. 2© Cloudera, Inc. All rights reserved. Your Hosts Greg Phillips Jake Miller
  • 3. 3© Cloudera, Inc. All rights reserved. Greg Phillips • Solutions Architect • gphillips@clouderagovt Jake Miller • Customer Operations Engineer • jakem@cloudera $ whoami
  • 4. 4© Cloudera, Inc. All rights reserved. Agenda • Intro • Installation & Configuration • Troubleshooting • Enterprise Considerations
  • 5. 5© Cloudera, Inc. All rights reserved. Machine JVM Linux Disk, CPU, Mem Machine JVM Linux Disk, CPU, Mem Machine JVM Linux Disk, CPU, Mem Machine JVM Linux Disk, CPU, Mem The Hadoop Stack: JVMs
  • 6. 6© Cloudera, Inc. All rights reserved. Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem The Hadoop Stack: Daemons
  • 7. 7© Cloudera, Inc. All rights reserved. Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem The Hadoop Stack: Storage + Coordination Random Access Storage: HBase File Storage: HDFS Coordination: ZooKeeper; Security: Sentry
  • 8. 8© Cloudera, Inc. All rights reserved. Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem The Hadoop Stack: Execution Random Access Storage: HBase File Storage: HDFS Coordination: ZooKeeper; Security: Sentry In Mem processing: Spark Interactive SQL: ImpalaBatch processing MapReduce Search: Solr Resource Management: YARN
  • 9. 9© Cloudera, Inc. All rights reserved. Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem The Hadoop Stack: Languages + UI Random Access Storage: HBase File Storage: HDFS Coordination: ZooKeeper; Security: Sentry In Mem processing: Spark Interactive SQL: ImpalaBatch processing MapReduce Search: Solr Resource Management: YARN Languages/APIs: Hive, Pig, Crunch, Kite, Mahout User interface: HUE
  • 10. 10© Cloudera, Inc. All rights reserved. Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem In Mem processing: Spark Interactive SQL: ImpalaBatch processing MapReduce Resource Management: YARN Coordination: ZooKeeper; Security: Sentry Search: Solr User interface: HUE Random Access Storage: HBase File Storage: HDFS Languages/APIs: Hive, Pig, Crunch, Kite, Mahout The Hadoop Stack: Integration Eventingest: Flume,Kafka DBImportExport: Sqoop Other systems: httpd, sas, custom apps etc
  • 11. 11© Cloudera, Inc. All rights reserved. Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem In Mem processing: Spark Interactive SQL: ImpalaBatch processing MapReduce Resource Management: YARN Coordination: ZooKeeper; Security: Sentry Search: Solr User interface: HUE Random Access Storage: HBase File Storage: HDFS Languages/APIs: Hive, Pig, Crunch, Kite, Mahout The Hadoop Stack Eventingest: Flume,Kafka DBImportExport: Sqoop Other systems: httpd, sas, custom apps etc
  • 12. 12© Cloudera, Inc. All rights reserved. Apache Hadoop Operations for Production Systems: Installation Greg Phillips
  • 13. 13© Cloudera, Inc. All rights reserved. Hardware Considerations • As a distributed system, Hadoop is going to be deployed onto multiple interconnected hosts • How large will the cluster be? • What services will be deployed on the cluster? • Can all services effectively run together on the same hosts or is some form of physical partitioning required? • What role will each host play in the cluster? • This impacts the hardware profile (CPU, Memory, Storage, etc) • How should the hosts be networked together?
  • 14. 14© Cloudera, Inc. All rights reserved. Host Roles within a Cluster Master Node HDFS NameNode YARN ResourceManager HBase Master Impala StateStore ZooKeeper Worker Node HDFS DataNode YARN NodeManager HBase RegionServer Impalad Utility Node Relational Database Management (eg: CM) Hive Metastore Oozie Impala Catalog Server Edge Node Gateway Configuration Client Tools Hue HiveServer2 Ingest (eg: Flume) For larger clusters, roles will be spread across multiple nodes of a given type
  • 15. 15© Cloudera, Inc. All rights reserved. Roles vs Cluster Size Master Worker Utility Edge Very Small (≤10) 1 ≤10 1 shared Host Small (≤20) 2 ≤20 1 shared Host Medium (≤200) 3 ≤200 2 1+ Large (≤500) 5 ≤500 2 1+
  • 16. 16© Cloudera, Inc. All rights reserved. Host Hardware Configuration • CPU • There’s no such thing as too much CPU • Jobs typically do not saturate their cores, so raw clock speed is not at a premium • Cost and Budget are the major factors here • Memory • You really don’t want to overcommit and swap • Java heaps should fit into physical RAM with additional space for OS and non- Hadoop processes
  • 17. 17© Cloudera, Inc. All rights reserved. Host Configuration (cont.) • Disk • More spindles == More I/O capacity • Larger drives == lower cost per TB • More hosts with less capacity increases parallelism and decreases re-replication costs when replacing a host • Fewer hosts with more capacity generally means lower unit cost • Rule of thumb: One disk per two configured YARN vcores • Lower latency disks are generally not a good investment, except for specific use-cases where random I/O is important
  • 18. 18© Cloudera, Inc. All rights reserved. Operating System Configuration • Turn off IPTables (or any other firewall tool) • Turn off SELinux • Turn down swappiness to 10 or less • Turn off Transparent Huge Page Compaction • Use a network time source to keep all hosts in sync (also timezones!) • Make sure forward and reverse DNS work on each host to resolve all other hosts consistently • Use the Oracle JDK - OpenJDK is subtly different and may lead you to grief
  • 19. 19© Cloudera, Inc. All rights reserved. Sanity Testing • Use the basic sanity tests provided by each service • Submit example jobs: pi, sleep, teragen/terasort • Work with sample tables in Hive/Impala • Use Hue to do these things if your users will • Repeat these tests when turning on security/authentication/authorization mechanisms • Make sure they succeed for the expected users and fail for others • Cloudera Manager provides some ‘canary’ tests for certain services • HDFS create/read/write/delete • Hbase, Zookeeper, etc
  • 20. 20© Cloudera, Inc. All rights reserved. Questions?
  • 21. 21© Cloudera, Inc. All rights reserved. Apache Hadoop Operations for Production Systems: Configuration Greg Phillips
  • 22. 22© Cloudera, Inc. All rights reserved. Agenda • Mechanics • Key Configurations • Configurations as Living Documents
  • 23. 23© Cloudera, Inc. All rights reserved. What is Configuration? • Everything to enable the cluster to function • Individual Hadoop configuration files eg. hdfs-site.xml • Where should processes run? • Process configuration files, environment variables, command line arguments • Which configuration changes require restart? • What kind of restart?
  • 24. 24© Cloudera, Inc. All rights reserved. Why use a Tool? • Require a meta configuration language • How do you describe where processes should run? • Lots of configurations to manage • Scoping of configurations • Some configurations apply globally, others locally
  • 25. 25© Cloudera, Inc. All rights reserved. Hadoop XML • Example: /etc/hadoop/conf/hdfs-site.xml • Configuration inheritance • Key value pairs representation • Many many of these files! <configuration> <property> <name>dfs.client.mmap.cache.size</name> <value>256</value> </property> </configuration>
  • 26. 26© Cloudera, Inc. All rights reserved. Editing a Configuration Demo!
  • 27. 27© Cloudera, Inc. All rights reserved. If Demo Failed… Step One: edit a config Step Two: save
  • 28. 28© Cloudera, Inc. All rights reserved. If Demo Failed… Step Three: at top-level, note that restart is needed Step Four: review changes Step five: restart
  • 29. 29© Cloudera, Inc. All rights reserved. Verifying Configuration • <namenode_host>:50070/conf
  • 30. 30© Cloudera, Inc. All rights reserved. Managing Configuration Files • Management tool will put configurations somewhere • Eg. core-site.xml, hdfs-site.xml, dfs_hosts_allow, dfs_hosts_exclude, hbase- site.xml, hive-env.sh, hive-site.xml, hue.ini, mapred-site.xml, oozie-site.xml, yarn- site.xml, zoo.cfg (and so on) • Often they are deployed in private locations • Good idea for files to be immutable
  • 31. 31© Cloudera, Inc. All rights reserved. Managing Configuration Files Demo!
  • 32. 32© Cloudera, Inc. All rights reserved. If Demo Failed… • e.g., /var/run/cloudera-scm-agent/process/*-NAMENODE
  • 33. 33© Cloudera, Inc. All rights reserved. Key Static Configuration • Security • Ports • Local Storage • JVM • Databases
  • 34. 34© Cloudera, Inc. All rights reserved. Networking • Does your DNS work? • Like, forwards and backwards? • For sure? • Have you really checked?
  • 35. 35© Cloudera, Inc. All rights reserved. HDFS Configurations of Note • Local Data Directories • dfs.datanode.data.dir: one per spindle; avoid RAID • dfs.namenode.name.dir: two copies of your metadata better than one • High Availability • Requires more daemons • Two failover controllers co-located with namenodes • Three journal nodes
  • 36. 36© Cloudera, Inc. All rights reserved. Resource Management in YARN • We want to prefer one multi tenant cluster over multiple clusters • Marketing vs. Engineering cluster resources • The “Fair Scheduler” implements the Dominant Resource Fairness Algorithm • Configured by an “allocations.xml” file • Possible to police resource usage through cgroups
  • 37. 37© Cloudera, Inc. All rights reserved. Yarn Configurations of Note • Yarn doles out resources to applications across two axes: memory and CPU • Define per-NodeManager resources • yarn.nodemanager.resource.cpu-vcores • yarn.nodemanager.resource.memory-mb • Some guardrails: • yarn.scheduler.maximum-allocation-vcores • yarn.scheduler.maximum-allocation-mb
  • 38. 38© Cloudera, Inc. All rights reserved. Yarn Configurations of Note • Can any other job fit? 1GB 1GB 1GB 1GB 1 core 1 core 1 core 1 core 1 core 1 core 🍅 1 GB, 1 core 🎃 2 GB, 2 cores 🍪 1 GB, 2 cores 🍅 🍪 🍪 🎃
  • 39. 39© Cloudera, Inc. All rights reserved. Configurations as Living Documents • Configuration is sometimes workload dependent • Leverage your metric system Configs Metrics
  • 40. 40© Cloudera, Inc. All rights reserved. Heap Sizing • Heap sizes: • Datanodes: linear in # of blocks • Namenode: linear in # of blocks and # of files
  • 41. 41© Cloudera, Inc. All rights reserved. Resource management • Fannie and Freddie are sharing the cluster • How do we configure the weights?
  • 42. 42© Cloudera, Inc. All rights reserved. In Summary • Book keeping of cluster configuration requires a tool • Make sure key configurations are set properly • Configuration files are living documents – use your metric system!
  • 43. 43© Cloudera, Inc. All rights reserved. 4 Apache Hadoop Operations for Production Systems: Troubleshooting Jake Miller
  • 44. 44© Cloudera, Inc. All rights reserved. 4 Troubleshooting Managing Hadoop Clusters Troubleshooting Hadoop Systems Debugging Hadoop Applications
  • 45. 45© Cloudera, Inc. All rights reserved. 4 Troubleshooting Managing Hadoop Clusters Troubleshooting Hadoop Systems Debugging Hadoop Applications
  • 46. 46© Cloudera, Inc. All rights reserved. 4
  • 47. 47© Cloudera, Inc. All rights reserved. 4 Other systems: httpd, sas, custom apps etc Tools for lots of machines Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem
  • 48. 48© Cloudera, Inc. All rights reserved. 4 Other systems: httpd, sas, custom apps etc Tools for lots of machines Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Ganglia, Nagios Ganglia*, Nagios*
  • 49. 49© Cloudera, Inc. All rights reserved. 4 Ganglia: Monitoring • Collects and aggregates metrics from machines • Good for what’s going on right now and describing normal perf • Organized by physical resource (CPU, Mem, Disk) – Good defaults – Good for pin pointing machines – Good for seeing overall utilization – Uses RRDTool under the covers • Some data scalability limitations, lossy over time. • Dynamic for new machines, requires config for new metrics
  • 50. 50© Cloudera, Inc. All rights reserved. 5 Graphite: Monitoring • Popular alternative to Ganglia • Can handle the scale of metrics coming in • Similar to Ganglia, but uses its own RRD database. • More aimed at dynamic metrics (as opposed to statically defined metrics)
  • 51. 51© Cloudera, Inc. All rights reserved. 5 Nagios: Alerting •Provides alerts from canaries and basic health checks for services on machines •Organized by Service (e.g. HTTP, POP, SSH, SMTP) •Defacto standard for service monitoring •Lacks distributed system know how •Requires bespoke setup for service slaves and masters •Lacks details with multi-tenant services or short-lived jobs
  • 52. 52© Cloudera, Inc. All rights reserved. 5 Tools for the Hadoop Stack Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem In Mem processing: Spark Interactive SQL: ImpalaBatch processing MapReduce Resource Management: YARN Coordination: zookeeper; Security: Sentry Search: Solr Eventingest: Flume,Kafka DBImportExport: Sqoop User interface: HUE Other systems: httpd, sas, custom apps etc Random Access Storage: HBase File Storage: HDFS Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Ganglia, Nagios Ganglia*, Nagios*
  • 53. 53© Cloudera, Inc. All rights reserved. 5 Tools for the Hadoop Stack Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem In Mem processing: Spark Interactive SQL: ImpalaBatch processing MapReduce Resource Management: YARN Coordination: zookeeper; Security: Sentry Search: Solr Eventingest: Flume,Kafka DBImport Export:Sqoop User interface: HUE Other systems: httpd, sas, custom apps etc Random Access Storage: HBase File Storage: HDFS Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Logs and Metrics Logs and Metrics Logs and Metrics Logs and Metrics Logs and Metrics Logs and Metrics Logs Logs and Metrics Logsand Metrics Logs Logs and Metrics Logs and Metrics HTrace HTrace Custom app logs /proc, dmesg, /var/logs/ jmx, jstack, jstat, GC logs
  • 54. 54© Cloudera, Inc. All rights reserved. 5 Tools for the Hadoop Stack Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem In Mem processing: Spark Interactive SQL: ImpalaBatch processing MapReduce Resource Management: YARN Coordination: zookeeper; Security: Sentry Search: Solr Eventingest: Flume,Kafka DBImport Export:Sqoop User interface: HUE Other systems: httpd, sas, custom apps etc Random Access Storage: HBase File Storage: HDFS Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Ganglia, Nagios Ganglia*, Nagios* Ganglia*, Nagios* ?
  • 55. 55© Cloudera, Inc. All rights reserved. 5 Ganglia and Nagios are not enough • Fault tolerant distributed system masks many problems (by design!) – Some failures are not critical – failure condition more complicated • Lacks distributed system know how – Requires bespoke setup for service slaves and masters – Lacks details with multi-tenant services or short-lived jobs • Hadoop services are logically dependent on each other – Need to correlate metrics across different service and machines – Need to correlate logs from different services and machines – What about all the logs? – What of all these metrics do we really need? • Some data scalability limitations, lossy over time – What about full fidelity?
  • 56. 56© Cloudera, Inc. All rights reserved. 5 OpenTSDB • OpenTSDB (Time Series Database) – Scalable storage to keep all your metrics – Efficiently stores metric data into HBase – Keeps data at full fidelity – Keep as much data as your HBase instance can handle. • Free and Open Source
  • 57. 57© Cloudera, Inc. All rights reserved. 5 Cloudera Manager • Extracts hardware, OS, and Hadoop service metrics specifically relevant to Hadoop and its related services. – Operation Latencies – # of disk seeks – HDFS data written – Java GC time – Network IO • Provides – Cluster preflight checks – Basic host checks – Regular health checks • Uses LevelDB for underlying metrics storage • Provides distributed log search • Monitors logs for known issues • Point and click for useful utils (lsof, jstack, jmap) • Free
  • 58. 58© Cloudera, Inc. All rights reserved. 5 Tools for the Hadoop Stack Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem In Mem processing: Spark Interactive SQL: ImpalaBatch processing MapReduce Resource Management: YARN Coordination: zookeeper; Security: Sentry Search: Solr Eventingest: Flume,Kafka DBImport Export:Sqoop User interface: HUE Other systems: httpd, sas, custom apps etc Random Access Storage: HBase File Storage: HDFS Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Ganglia, Nagios Ganglia*, Nagios* Full Fidelity metrics: Open TSDB*Metrics + Logs: Cloudera Manager Logs: Search a la Splunk or Rocana
  • 59. 59© Cloudera, Inc. All rights reserved. 5 Troubleshooting Managing a Hadoop Clusters Troubleshooting Hadoop Systems Debugging Hadoop Applications
  • 60. 60© Cloudera, Inc. All rights reserved. 6 The Law of Cluster Inertia A cluster in a good state stays in a good state, and a cluster in a bad state stays in a bad state, unless acted upon by an external force.
  • 61. 61© Cloudera, Inc. All rights reserved. 6 Admins and Users as an external force Upgrades • Linux • Hadoop • Java Misconfiguration • Memory Mismanagement – JVM vs Container OOME – NN OOME • Thread Mismanagement – Fetch Failures – Replicas • Disk Mismanagement – No File – Too Many Files
  • 62. 62© Cloudera, Inc. All rights reserved. 6 Understanding Normal • Establish normal – Boring logs are good • So that you can detect the abnormal – Find anomalies and outliers – Compare performance • And then isolate root causes – Who are the suspects – Pull the thread by interrogating – Correlate across different subsystems
  • 63. 63© Cloudera, Inc. All rights reserved. 6 Basic tools • What happened? – Logs • What state are we in now? – Metrics – Thread stacks • Is it alive? – listings – Canaries • Is it OK? – fsck / hbck • What happens when I do this? – Tracing – Dumps
  • 64. 64© Cloudera, Inc. All rights reserved. 6 Diagnostics for single machines Machine Linux Disk, CPU, Mem Analysis and Tools via @brendangregg
  • 65. 65© Cloudera, Inc. All rights reserved. 6 Diagnosis Tools HW/Kernel Logs /log, /var/log dmesg Metrics /proc, top, iotop, sar, vmstat, netstat Tracing Strace, tcpdump Dumps Core dumps Liveness ping Corrupt fsck
  • 66. 66© Cloudera, Inc. All rights reserved. 6 Diagnostics for the JVM • Most Hadoop services run in the Java VM, intros new java subsystems – Just in time compiler – Threads – Garbage collection • Dumping Current threads – jstacks (any threads stuck or blocked?) • JVM Settings: – Enabling GC Logs: -XX:+PrintGCDateStamps -XX:+PrintGCDetails – Enabling OOME Heap dumps: -XX:+HeapDumpOnOutOfMemoryError • Metrics on GC – Get gc counts and info with jstat • Memory Usage dumps – Create heap dump: jmap –dump:live,format=b,file=heap.bin <pid> – View heap dump from crash or jmap with jhat Machine JVM Linux Hadoop Daemons Disk, CPU, Mem
  • 67. 67© Cloudera, Inc. All rights reserved. 6 GC Logs Are Your Friends • GC Logging – -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/var/log/hdfs/nn-hdfs.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=20M • Great resource: – https://stackoverflow.com/questions/895444/java-garbage-collection-log-messages
  • 68. 68© Cloudera, Inc. All rights reserved. 6 • jstack – Use the same JDK – Must be run as user of the process – Can force it with –F but that abstracts away deadlock info • kill -3 <PID> – Dumps jstack to stdout CM stores as separate logfiles JVM Debugging Techniques: Hung Process
  • 69. 69© Cloudera, Inc. All rights reserved. 6 JVM Debugging Techniques: Hung Process
  • 70. 70© Cloudera, Inc. All rights reserved. 7 JVM Debugging Techniques: Heap Analysis • jstat -gcutil <PID> 1s 120 – Checks for current GC activity • jmap -histo:live <PID> – Get a histogram of the current objects within the heap
  • 71. 71© Cloudera, Inc. All rights reserved. 7 Diagnosis Tools HW/Kernel JVM Logs /log, /var/log dmesg Enable gc logging Metrics /proc, top, iotop, sar, vmstat, netstat jstat Tracing Strace, tcpdump Debugger, jdb, jstack Dumps Core dumps jmap Enable OOME dump Liveness ping Jps Corrupt fsck
  • 72. 72© Cloudera, Inc. All rights reserved. 7 Diagnostics for the Hadoop Stack • Single client call can trigger many RPCs spanning many machines • Systems are evolving quickly • A failure on one daemon, by design, does not cause failure of the entire service • Logs: • Each service’s master and slaves have their own logs: /var/log/ • There are lot of logs and they change frequently • Metrics: • Each daemon offers metrics, often aggregated at masters • The hard part is figuring out which metrics matter to you • Tracing: • HTrace (integrated into Hadoop, HBase; currently in Apache Incubator) • What strace did for system calls, HTrace does for RPC calls • Liveness: • Canaries, service/daemon web UIs
  • 73. 73© Cloudera, Inc. All rights reserved. 7 Hadoop Logs Are Your Friends • Logs are verbose but necessary – NameNode logs (500 node cluster): • 10 files * 200MB/file = 2GB • 2GB of logs span 3 hours • 3 days of logs = ~48GB • Retain enough logs for debugging & plan for worst case – Adjust log retention as the cluster grows – To properly analyze, you need to log enough – You may only get one shot
  • 74. 74© Cloudera, Inc. All rights reserved. 7 • Just reduce the log level to save space? NO! – INFO logging helps us understand what happened right before failure • E.g. who logged in, what job was running, what they were writing to • Application: Yarn containers write logs locally, then migrate to HDFS – Ensure application log space is sufficient – It’s a distributed system but make sure local system is configured correctly INFO Logs Are Your Friends
  • 75. 75© Cloudera, Inc. All rights reserved. 7 Hadoop Debugging Techniques: LogLevel • Set the log level without process restarts http://namenode.cloudera.com:50070/logLevel • Scriptable Java servlet http://namenode.cloudera.com:50070/logLevel?log=org&level=DEBUG
  • 76. 76© Cloudera, Inc. All rights reserved. 7 Metrics: HDFS • HDFS is the core of the platform. What’s important? – Is the Standby NN checkpointing? – Are the NNs garbage collecting? – Percentage of heap used at steady state? Best to checkpoint every hour
  • 77. 77© Cloudera, Inc. All rights reserved. 7 Diagnosis Tools HW/Kernel JVM Hadoop Stack Logs /log, /var/log dmesg enable gc logging *:/var/log/hadoop|hbase|… Metrics /proc, top, iotop, sar, vmstat, netstat jstat *:/stacks, *:/jmx, Tracing strace, tcpdump debugger, jdb, jstack htrace Dumps core dumps jmap enable OOME dump *:/dump Liveness ping Jps Web UI Corrupt fsck HDFS fsck, HBase hbck
  • 78. 78© Cloudera, Inc. All rights reserved. 7 Finding the right part of the stack E-SPORE (from Eric Sammer’s Hadoop Operations) • Environment – What is different about the environment now from the last time everything worked? • Stack – The entire cluster also has shared dependency on data center infrastructure such as the network, DNS, and other services. • Patterns – Are the tasks from the same job? Are they all assigned to the same tasktracker? Do they all use a shared library that was changed recently? • Output – Always check log output for exceptions but don’t assume the symptom correlates to the root cause. • Resources – Do local disks have enough? Is the machine swapping? Does the network utilization look normal? Does the CPU utilization look normal? • Event correlation – It’s important to know the order in which the events led to the failure.
  • 79. 79© Cloudera, Inc. All rights reserved. 7 Troubleshooting Managing Hadoop Clusters Troubleshooting Hadoop Systems Debugging Hadoop Applications
  • 80. 80© Cloudera, Inc. All rights reserved. 8 Example application pipeline with strict SLAs ingest process export ingest process export ingest process export time
  • 81. 81© Cloudera, Inc. All rights reserved. 8 Example Application Pipeline - Ingest Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Coordination: zookeeper; Security: SentryFile Storage: HDFS Custom app feeds event data into HBase Random Access Storage: HBase Other systems: httpd, sas, custom apps etc Eventingest: Flume,Kafka
  • 82. 82© Cloudera, Inc. All rights reserved. 8 Example Application Pipeline - processing Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Coordination: zookeeper; Security: Sentry Eventingest: Flume,Kafka Other systems: httpd, sas, custom apps etc Batch processing MapReduce Resource Management: YARN Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Random Access Storage: HBase Map Reduce Job generates new artifact from HBase data and writes to HDFS File Storage: HDFS
  • 83. 83© Cloudera, Inc. All rights reserved. 8 Batch processing MapReduce Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Example Application Pipeline - export Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Coordination: zookeeper; Security: Sentry Eventingest: Flume,Kafka Random Access Storage: HBase 9 Resource Management: YARN DBImportExport: Sqoop File Storage: HDFS Other systems: httpd, sas, custom apps etc Sqoop result into relational database to serve application
  • 84. 84© Cloudera, Inc. All rights reserved. 8 Case study 1: slow jobs after Hadoop upgrade Symptom: After an upgrade, activity on the cluster eventually began to slow down and the job queue overflowed.
  • 85. 85© Cloudera, Inc. All rights reserved. 8 Example Application Pipeline - processing Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Coordination: zookeeper; Security: Sentry Eventingest: Flume,Kafka Other systems: httpd, sas, custom apps etc Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Random Access Storage: HBase File Storage: HDFS Batch processing MapReduce Resource Management: YARN Map Reduce Job generates new artifact from HBase data and writes to hdfs
  • 86. 86© Cloudera, Inc. All rights reserved. 8 Case study 1: slow jobs after Hadoop upgrade Evidence: Isolated to Processing phase (MR). In NM Logs, found an innocuous but anomalous log entry about “fetch failures.” Many users had run in to this MR problem using different versions of MR. Workaround provided: remove the problem node from the cluster.
  • 87. 87© Cloudera, Inc. All rights reserved. 8 Machine JVM Linux Hadoop Daemons Disk, CPU, Mem
  • 88. 88© Cloudera, Inc. All rights reserved. 8 Case study 1: slow jobs after Hadoop upgrade Root cause: All MR versions had a common dependency on a particular version of Jetty (Jetty 6.1.26). Dev was able to reproduce and fix the bug in Jetty.
  • 89. 89© Cloudera, Inc. All rights reserved. 8 Case study 2: slow jobs after Linux upgrade Symptom: After an upgrade, system CPU usage peaked at 30% or more of the total CPU usage.
  • 90. 90© Cloudera, Inc. All rights reserved. 9 Case study 2: slow jobs after Linux upgrade Evidence: Used tracing tools to isolate a majority of time was inexplicably spent in virtual memory calls. http://structureddata.org/2012/06/18/linux-6- transparent-huge-pages-and-hadoop-workloads/
  • 91. 91© Cloudera, Inc. All rights reserved. 9 Machine JVM Linux Hadoop Daemons Disk, CPU, Mem
  • 92. 92© Cloudera, Inc. All rights reserved. 9 Case study 2: slow jobs after Linux upgrade Root cause: Running RHEL or CentOS versions 6.2, 6.3, and 6.4 or SLES 11 SP2 has a feature called "Transparent Huge Page (THP)" compaction which interacts poorly with Hadoop workloads.
  • 93. 93© Cloudera, Inc. All rights reserved. 9 Case study 3: slow jobs at a precise moment Symptom: High CPU usage and responsive but sluggish cluster - even non- Hadoop apps (e.g. MySQL). 30 customers all hit this at the exact same time: 6/30/12 at 5pm PDT.
  • 94. 94© Cloudera, Inc. All rights reserved. 9 Case study 3: slow jobs at a precise moment Evidence: Checked the kernel message buffer (run dmesg) and look for output confirming the leap second injection. Other systems had same problem.
  • 95. 95© Cloudera, Inc. All rights reserved. 9 Machine JVM Linux Hadoop Daemons Disk, CPU, Mem
  • 96. 96© Cloudera, Inc. All rights reserved. 9 Case study 3: slow jobs at a precise moment Root cause: Linux OS kernel mishandled an added leap second.
  • 97. 97© Cloudera, Inc. All rights reserved. 9 Similar symptoms, different problem
  • 98. 98© Cloudera, Inc. All rights reserved. 9 Case study 1: slow jobs after Hadoop upgrade • After an upgrade, activity on the cluster eventually began to slow down and the job queue overflowed. In TT Logs, found an innocuous but anomalous log entry about “fetch failures.” Many users had run in to this MR problem using different versions of MR. Workaround provided: remove the problem node from the cluster. All MR versions had a common dependency on a particular version of Jetty (Jetty 6.1.26) . Dev was able to reproduce and fix the bug in Jetty. Symptom Evidence Root Cause
  • 99. 99© Cloudera, Inc. All rights reserved. 9 Case study 2: slow jobs after Linux upgrade • After an upgrade, system CPU usage peaked at 30% or more of the total CPU usage. Perf tool which proved that majority of time was inexplicably spent in virtual memory calls. Running RHEL or CentOS versions 6.2, 6.3, and 6.4 or SLES 11 SP2 has a feature called "Transparent Huge Page (THP)" compaction which interacts poorly with Hadoop workloads. Symptom Evidence Root Cause
  • 100. 100© Cloudera, Inc. All rights reserved. 1 Case study 3: slow jobs at a precise moment • High CPU usage and responsive but sluggish cluster. • 30 customers all hit this at the exact same time. Checked the kernel message buffer (run dmesg) and look for output confirming the leap second injection. Linux OS kernel mishandled a leap second added on 6/30/12 at 5pm PDT. Symptom Evidence Root Cause
  • 101. 101© Cloudera, Inc. All rights reserved. 1 Lessons learned More crucial than the specific troubleshooting methodology used is to use one. More crucial than the specific tool used is the type of data analyzed and how it’s analyzed. Capture for posterity in a knowledge base article, blog post, or conference presentation. Methodology Tools Learn from failure
  • 102. Apache Hadoop Operations for Production Systems: Enterprise Considerations Jake Miller
  • 103. 103© Cloudera, Inc. All rights reserved. 1 Enterprise Considerations • Security • Authentication • Confidentiality • Uptime and availability • Rolling upgrades • Migration • Backup and Disaster recovery • Multi-tenancy • Resource Management • Auditing
  • 104. 104© Cloudera, Inc. All rights reserved. 1 Enterprise Considerations • Security • Authentication • Confidentiality • Uptime and availability • Rolling upgrades • Migration • Backup and Disaster recovery • Multi-tenancy • Resource Management • Auditing
  • 105. 105© Cloudera, Inc. All rights reserved. 1 Security • Safeguards –Authentication –Authorization –Auditing •Threats • Data Corruption • Encryption in transit • Encryption at rest Goal: Ensure only authorized, authenticated users have access to data
  • 106. 106© Cloudera, Inc. All rights reserved. 1 –Authentication limit who can connect • Kerberos - Strong Authentication –MIT Kerberos, Active Directory –LDAP for user management –User obtains ticket used to identify oneself to the daemons –Cloudera Manager helps configure • Without Kerberos –Uses Linux user of the client application –Easy to spoof Authentication – Proving Who You are
  • 107. 107© Cloudera, Inc. All rights reserved. 1 Authorization: Controlling user can access or do • GRANT REVOKE on CRUD + Admin Permissions • Each service provides different authorization mechanisms • HDFS read / write / excute file permisisons • YARN queues can be restricted to certain users • HBase tables can be restricted to certain users (ACLs) • Apache Sentry unifies all authorizations to a single point • Up and coming • RecordService – fine grained record authorization
  • 108. 108© Cloudera, Inc. All rights reserved. 1 Audit – tracking who did what • Track which users performed particular operations • Audit – who and when was something done? • Cloudera Navigator
  • 109. 109© Cloudera, Inc. All rights reserved. 1 Confidentiality - information seen by intended users • Securing communication channels within the cluster • TLS • Used to secure http interfaces • Kerberos can be used to authenticate to these interfaces with SPNEGO • SASL • Used to secure data node data transfers • Hadoop / Base RPC • HDFS Encryption • KeyTrustee • Encrypted on disk
  • 110. 110© Cloudera, Inc. All rights reserved. 1 Enterprise Considerations • Security • Authentication • Confidentiality • Uptime and availability • Rolling upgrades • Migration • Backup and Disaster recovery • Multi-tenancy • Resource Management • Auditing
  • 111. 111© Cloudera, Inc. All rights reserved. 1 Uptime and availability • Unplanned –Hardware failures –Software errors –Human error •Planned • Configuration Changes • Upgrades • Migrations Goal: Handle failure events with minimal operational impact
  • 112. 112© Cloudera, Inc. All rights reserved. 1 Batch processing MapReduce Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Unplanned down time? Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Coordination: zookeeper; Security: Sentry Eventingest:Flume, Kafka Random Access Storage: HBase Resource Management: YARN DBImportExport: Sqoop File Storage: HDFS Other systems: httpd, sas, custom apps etcMachine JVM HW Primary Job In Mem processing: Spark Interactive SQL: Impala Search: Solr User interface: HUE
  • 113. 113© Cloudera, Inc. All rights reserved. 1 Fault tolerance within the system • Core systems are designed to sustain failures and failover automatically • HDFS – 3x data replication; HDFS HA, 2NN, QJM/FC; Disk hot swap • HBase – Multiple HMasters, Region Servers ; Read Replicas • ZooKeeper – ZK peer Quorum • YARN (MR, Spark, Impala) : Standby Resource Manager • Security to mitigate human error
  • 114. 114© Cloudera, Inc. All rights reserved. 1 Planned down time • Wire compatibility + Extensible Data formats • Allow for forwards and backwards compatibility • Older clients can talk to newer servers and visa versa • Rolling upgrade • Upgrade a single node at a time while system runs • CM Orchestrates and manages this • Allows API and changes while guaranteeing wire compatibility between different minor versions • CDH guarantees Client-server compatibility between Minor Versions
  • 115. 115© Cloudera, Inc. All rights reserved. 1 Snapshots: Recovering from Human mistakes • Snapshot the state of directory/table at a certain moment in time • Rename it later, creating a new read write directory • Export it to another cluster time User Error: rm -r Service is restored, minor data loss Periodic snapshot Data is gone! restore Periodic snapshot
  • 116. 116© Cloudera, Inc. All rights reserved. 1 External Risks
  • 117. 117© Cloudera, Inc. All rights reserved. 1 Batch processing MapReduce Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Unplanned down time? Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Coordination: zookeeper; Security: Sentry Eventingest:Flume, Kafka Random Access Storage: HBase Resource Management: YARN DBImportExport: Sqoop File Storage: HDFS Other systems: httpd, sas, custom apps etc In Mem processing: Spark Interactive SQL: Impala Search: Solr User interface: HUE
  • 118. 118© Cloudera, Inc. All rights reserved. 1 Batch processing MapReduce Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Unplanned down time? Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Coordination: zookeeper; Security: Sentry Eventingest:Flume, Kafka Random Access Storage: HBase Resource Management: YARN DBImportExport: Sqoop File Storage: HDFS Other systems: httpd, sas, custom apps etc In Mem processing: Spark Interactive SQL: Impala Search: Solr User interface: HUE
  • 119. 119© Cloudera, Inc. All rights reserved. 1 Geographically separated copies of data
  • 120. 120© Cloudera, Inc. All rights reserved. 1 Strategy: Supported Batch HDFS Backups • Dist CP • A MapReduce based utility Distcp Source
  • 121. 121© Cloudera, Inc. All rights reserved. 1 Strategy: Custom Application-managed Replication • Application writes to two instances of HDFS • Sources could be Kafka instances • Adds complexity • Inefficient SourceSource
  • 122. 122© Cloudera, Inc. All rights reserved. 1 Enterprise Considerations • Security • Authentication • Confidentiality • Uptime and availability • Rolling upgrades • Migration • Backup and Disaster recovery • Multi-tenancy • Resource Management • Auditing
  • 123. 123© Cloudera, Inc. All rights reserved. 1 Multitenancy • Within a Cluster –Resource Management –Scale •Across clusters • Proxies Goal: Serve multiple users, applications, and datasets.
  • 124. 124© Cloudera, Inc. All rights reserved. 1 Multiple Workloads • Multiple Workloads, many datasets on one cluster • Resource Sharing • Overprovision • Static isolation if latency sensitive • Multiple Workloads, shared dataset on one cluster • Replicate to isolate if latency sensitive • Future: • Improving IO isolation for YARN/Slider/Mesos?
  • 125. 125© Cloudera, Inc. All rights reserved. 1 Resource Management • Policies for resource sharing • HDFS Quotas • Yarn FairScheduler Pools • HBase request throttling • Hive / Impala Access Control with Sentry
  • 126. 126© Cloudera, Inc. All rights reserved. 1 Scale Considerations • Multi Rack Deployments • Specify HDFS rack topology configuration • Limits failure domain to a single rack. • Performance resiliency tradeoff • Heterogeneous machine deploys • When clusters get larger, recovery is more expensive. • NN, CM machines should be beefier.
  • 127. 127© Cloudera, Inc. All rights reserved. 1 Gateway Proxies Batch processing MapReduce Languages/APIs: Hive, Pig, Crunch, Kite, Mahout Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Machine JVM Linux Hadoop Daemons Disk, CPU, Mem Coordination: zookeeper; Security: Sentry Eventingest:Flume, Kafka Random Access Storage: HBase Resource Management: YARN DBImportExport: Sqoop File Storage: HDFS Other systems: httpd, sas, custom apps etc In Mem processing: Spark Interactive SQL: Impala Search: Solr User interface: HUE
  • 128. 128© Cloudera, Inc. All rights reserved. 1 Gateway Proxies • Deploy your cluster without exposing to all nodes to users Machi ne Machi ne Machi ne Machi ne HUE DMZ Application HBase Proxy HttpFS (HDFS) JDBC: Impala/ Hive UsersUsers Search Machi ne Machi ne Machi ne Machi ne DMZ HBase Proxy HttpFS (HDFS) JDBC: Impala/ Hive Search
  • 129. 129© Cloudera, Inc. All rights reserved. 1 Enterprise considerations • Ensure only authorized, authenticated • users have access to data Handle failure events with minimal operational impact Serve multiple users, applications, and datasets. Security Availability Multitenancy
  • 130. 130© Cloudera, Inc. All rights reserved. 1 Conclusions
  • 131. 131© Cloudera, Inc. All rights reserved. 1 Hadoop operations takeaways • Cloudera has seen a lot of diverse clusters and used that experience to build tools to help diagnose and understand how Hadoop operates. Similar symptoms can lead to different root causes. Use tools to assist with event correlation and pattern determination. Most enterprises require strategies for Security, Availability, Disaster recovery, Maintenance, and Multi-tenancy. Managing Clusters Troubleshooting Enterprise Considerations Kevin Jarrett https://flic.kr/p/cDKWvL
  • 132. 132© Cloudera, Inc. All rights reserved. 1 Questions?