More Related Content
Similar to Cloudera User Group SF - Cloudera Manager: APIs & Extensibility (20)
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
- 1. Cloudera Manager – API’s &
Extensibility
Bala Venkatrao, Products@Cloudera
December 2013
1
- 2. Cloudera Manager
End-to-End Administration for CDH
Manage
1
Monitor
2
Diagnose
3
Integrate
4
Easily deploy, configure & optimize clusters
Maintain a central view of all activity
Easily identify and resolve issues
Use Cloudera Manager with existing tools
2
©2013 Cloudera, Inc. All Rights Reserved.
- 3. Integrating with your IT Mgmt tools
Datacenter Operations
Various options of integrating Cloudera Manager into your existing
Installation,
Datacenter Operations/Tools Monitoring
Alerting
Deployment
Tools
tools
Tools
e.g. Orion,
• Cloudera Manager API
e.g. Chef,
e.g Nagios,
Tivoli, BMC
Puppet etc.
SNMP etc.
etc.
• Introduced in CM4 (June 2012)
• Installation & deployment
• Monitoring
• SNMP Alerts
• Introduced in CM4.5 (Feb 2013)
• Hadoop Operations
And more…
Cloudera
• Monitoring ‘tsquery’ (Feb 2013)
Manager
• User-defined triggers/alarms (new for C5!)
• Service extensibility (new for C5!)
3
©2013 Cloudera, Inc. All Rights Reserved.
- 4. Cloudera Manager (CM) API
•
•
API access was a feature introduced in Cloudera Manager 4.0, providing programmatic access
to cluster operations (such as configuration and restart) and monitoring information (such as
health and metrics).
The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host
and port as the CM web UI, and does not require an extra process or extra configuration. API
users have the same privileges as they do in the web UI world.
• Docs & Examples
http://cloudera.github.io/cm_api/
https://github.com/cloudera/cm_api
• Java/Python clients
http://blog.cloudera.com/blog/2013/05/how-toautomate-your-hadoop-cluster-from-java/
4
©2013Cloudera, Inc. All Rights Reserved.
- 5. Examples of integration with CM API
•
Installation & Deployment
•
•
Chef/Puppet
Dell Crowbar
•
•
StackIQ
•
•
•
•
http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-withdell-crowbar-and-cloudera-manager/
http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-withCloudera
WANdisco – non-stop NN setup
Several other customers/partners leveraging the API’s as part of their
install & deployment process
Monitoring & Alerting
•
•
Oracle Enterprise Manager (via Big Data Appliance)
Nagios
•
•
https://github.com/cloudera/cm_api/tree/master/nagios
https://github.com/harisekhon/nagiosplugins/blob/master/check_hadoop_cloudera_manager_metrics.pl
Develop & Contribute your plug-in’s using Cloudera
• SNMP alerts integration with IBM Netcool
Manager API
5
©2013 Cloudera, Inc. All Rights Reserved.
- 6. Cloudera Manager – Monitoring via ‘tsquery’
•
Introduced as part of CM4.5 release (Feb 2013)
•
Great way to add interesting charts (above & beyond what is provided by default)
and monitor metrics that are relevant to your clusters
•
The tsquery language is used to specify statements for retrieving time-series data
from the Cloudera Manager time-series data store
•
Example: How do I compare all disk IO for all the DataNodes that belong to a specific
HDFS service?
select bytes_read, bytes_written where roleType=DATANODE and
serviceName=hdfs1
•
Retrieved time-series data can be plotted via various options – line, bar, scatter, heat
maps, table list etc.
•
Extending this concept to create user-defined triggers/alarms (new for C5!).
•
More details
•
6
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-ManagerDiagnostics-Guide/cm5dg_chart_time_series_data.html
©2013 Cloudera, Inc. All Rights Reserved.
- 7. Examples of Cloudera Manager ‘tsquery’
Example1: How do I track the
aggregate Cluster Disk IO?
select dt0(read_bytes_disk_sum),
dt0(write_bytes_disk_sum) where
category = CLUSTER and clusterId =
$CLUSTERID
Example2: How do I compare CPU
usage across hosts?
select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_system) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100
Create & Contribute your ‘tsqueries’!
https://github.com/cloudera/cm_charting_scrapbook
7
©2013 Cloudera, Inc. All Rights Reserved.
- 8. Cloudera as an Application Platform
ISV’s view of a Database
Workload
Mgmt
Drivers
JDBC/ODBC
Security
Mgmt
Data
Access
API’s
ISV’s view of an OS
Systems
Mgmt
Package
Mgmt
Core Database
8
Process/
Resource
Mgmt
Security
Mgmt
Data
Access
API’s
Core OS kernel
©2013Cloudera, Inc. All Rights Reserved.
Systems
Mgmt
- 9. Cloudera as an Application Platform
ISV’s view of Cloudera
Package
Mgmt
Workload/
Process
Mgmt
Security
Mgmt
Data
Access
API’s
Drivers
JDBC/ODBC
CDH
9
©2013Cloudera, Inc. All Rights Reserved.
Systems
Mgmt
- 10. Cloudera Platform Features
Features
Description
Examples
Package Mgmt
- Ability to easily package and distribute binaries/jars via
“Parcels”
Informatica, Syncsort, LZO libraries
Workload/ Process Mgmt
- Ability to deploy applications as stand-alone processes
or via YARN* on the Hadoop cluster
- Isolation of cluster resources
SAS, 0xData, Accumulo, Spark
Security Mgmt
- Support for Kerberos Mgmt
- Role bases access control for Tables/Views in
Hive/Impala via Sentry
Data Access API’s
- HDFS API, HBase API, Search API, Spark API
- Kite (formerly Cloudera Development Kit)
Causata, Basis Tech, CounterTack, Amdocs
Drivers
- ODBC/JDBC drivers for Hive/Impala
Zoomdata, Tableau, Microstrategy, Qlikview
Systems Mgmt
- End-to-End management of an application via Cloudera
Manager (CM)
StackIQ, Dell Crowbar, Oracle OEM
Manage
-Deploy and upgrade (rolling) services and pkgs
-Manage configurations
Monitor
-Proactive health checks
-Track resource utilization
-Custom metrics charts
Diagnose
-Distributed log collection and searching
-Tag and track key events
Integrate
-Access CM via API
* Support for YARN planned as part of CM5.x in FY14
10
©2013Cloudera, Inc. All Rights Reserved.
- 11. Example – Deployment via Parcels
The platform for Big Data
+
The ETL app for hadoop
•
•
Smarter Deployment & Administration: Seamless integration with
Cloudera Manager for one-click deployment and easier
administration
•
11
Smarter Architecture: No code generation. ETL engine runs natively
within Hadoop MapReduce, via plugin included in CDH 4.2
Smarter Monitoring: Comprehensive logging capabilities + activity
monitoring through Cloudera Manager
©2013Cloudera, Inc. All Rights Reserved.
- 12. How it works
1. Download Syncsort DMX-h “Parcel” file to your custom repository
File contains everything you need to properly
deploy Syncsort DMX-h ETL Edition on Cloudera
2. Distribute & activate DMX-h parcel on your Cloudera cluster
A
C
Find Nodes
Install
Components
Assign Roles
Enter the names of the hosts
which will be included in the
Hadoop cluster. Click
Continue.
12
B
Cloudera Manager
automatically installs the CDH
components on the hosts you
specified.
Verify the roles of the nodes
within your cluster. Make
changes as necessary.
©2013Cloudera, Inc. All Rights Reserved.
- 13. Syncsort DMX-h + Cloudera Manager
Cloudera Manager
CDH Cluster + ISV software
Support
Integration
Monitoring
Syncsort
DMX-h
A
P
I
Management
Installation
CDH Nodes
13
DMX-h on every CDH node
©2013Cloudera, Inc. All Rights Reserved.
13
- 14. Get a 360° View of Your Cluster, Including DMX-h Logs
View service health
& performance
Get host-level
snapshots
Monitor &
diagnose workloads
Gather, view & Distribute your own Parcels via Cloudera Manager and
Build and search
Hadoop & DMX-h logs
…And more!!
14
share it with the community !
©2013Cloudera, Inc. All Rights Reserved.
- 15. Service Extensibility
•
Introduced in C5
•
Still in Beta!
•
•
Similar look and feel as existing services
•
Easy to write (Java-free!)
•
Flexible
•
15
Single management console for CDH, non-CDH services and
ISV applications
Independent release cycle
©2013Cloudera, Inc. All Rights Reserved.
- 16. So.. How does it work?
• A JSON file that describes of your service
• Set of control scripts
• Packaged as a JAR file
• As promised, Java-free
16
©2013Cloudera, Inc. All Rights Reserved.
- 22. The Code
name : “spark”,
#!/bin/bash
roles : [{
CMD=$1
name : "master",
MASTER_PORT=<read in from ./params.properties>
startRunner : {
program : "scripts/control.sh",
case $CMD in
args : [ "start_master",
(start_master)
"./params.properties"]
exec $SPARK_HOME/scripts/spark-start.sh master"
},
;;
parameters : [{
(*)
name : "master_port",
echo "$timestamp Don't understand [$CMD]"
type : "port",
;;
default : 7077
esac
}],
configWriter : {
generators : [{
filename : "params.properties"
}]
}]
22
©2013Cloudera, Inc. All Rights Reserved.
- 23. Next Steps
• Documentation & SDK as part of C5 Beta2
or later (definitely before GA!)
• Working with select ISV’s (SAS, 0xData
etc.) as part of Beta to further fine-tune
this feature
Develop & Contribute your Cloudera Manager service extensibility
plug-in’s !
23
©2013Cloudera, Inc. All Rights Reserved.
- 24. Service Extensibility
Vertical Extension
Vision of CM Extensibility
Horizontal Extension
0xData
SAS
Syncsort
Informatica
Revolution
API
Ops Apps
Capacity
Mgr
Security
ISV’s
SLA Mgr
Cost
Optimizer
CDH
CM
SNMP API
Oracle
OEM
24
Nagios
Dell
Chef/
Puppet
©2013Cloudera, Inc. All Rights Reserved.
Accumulo
Spark
Giraph
- 25. Q&A
• If you interested in learning more,
participating in Beta, contributing plug-ins
or Apps, contact: bala@cloudera.com
25
©2013Cloudera, Inc. All Rights Reserved.
- 26. Appendix/Resources
•
•
•
•
•
26
Systems Management
•
Cloudera Manager API
•
http://cloudera.github.io/cm_api/
•
http://blog.cloudera.com/blog/2013/05/how-to-automate-your-hadoop-cluster-from-java/
Package Management
•
Docs on Parcels
•
http://training.cloudera.com/elearning/Parcels/
•
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-ManagerIntroduction/cmi_primer.html
•
http://blog.cloudera.com/blog/2013/05/faq-understanding-the-parcel-binary-distribution-format/
•
http://blog.cloudera.com/blog/2013/07/one-engineers-experience-with-parcel/
Data Access API’s
•
http://blog.cloudera.com/blog/2013/05/cloudera-development-kit-cdk/
•
https://github.com/cloudera/cdk
Workload/Resource Management
•
Cloudera Manager 5 documentation
•
http://cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-ManagingClusters/cm5mc_managing_resources.html
•
http://blog.cloudera.com/blog/2013/05/how-the-sas-and-cloudera-platforms-work-together/
Security Management
•
http://blog.cloudera.com/blog/2013/07/with-sentry-cloudera-fills-hadoops-enterprise-security-gap/
©2013Cloudera, Inc. All Rights Reserved.