2. It's not about
Ashley Stewart
EDSC 320
Final Project
http://www.kids-birthday-party-guide.com/harry-
potter-party.html
And the Prisoner of
Azkaban
3. What is Azkaban ?
Azkaban is a batch workflow job scheduler
It was created at LinkedIn to run Hadoop jobs
Azkaban resolves the ordering through job dependencies
It provides an easy to use web user interface to maintain and track your
workflows
4. Why Azkaban ?
Easy to use web UI
Retrying of failed jobs
Simple web and http workflow uploads
Workflow as a DAG (directed acyclic graph) made up of individual steps
Allow to run series of map-reduce, pig, java & scripts actions a single
workflow job.
Allow regular scheduling of workflow jobs
Detect Failure
SLA alerting and auto killing
Email alerts on failure and successes
6. AzkabanWebServer
The web server uses the db for the following reasons:
Project Management - The projects, the permissions , uploaded files.
Executing Flow State - Keep track of executing flows and which Executor is
running them.
Previous Flow/Jobs - Search through previous executions of jobs and log
files.
Scheduler - Keeps the state of the scheduled jobs.
- Azkaban uses *.job key-value property files to define individual tasks in a
workflow, and the _dependencies_ property to define the dependency
chain of the jobs.
- These job files and associated code can be archived into a *.zip and
uploaded through the web server through the Azkaban UI or through curl.
7. AzkabanExecutorServer
The executor server uses the db for the following reasons:
Access the project - Retrieves project files from the db.
Executing Flows/Jobs - Retrieves and updates data for flows and that are
executing
Logs - Stores the output logs for jobs and flows into the db.
8.
9. Creating Flows
A job is a process you want to run in Azkaban.
Jobs can be set up to be dependent on other jobs. The graph created by a set
of jobs and their dependencies are what make up a flow.
Creating Jobs:-
Creating a job is very easy. We create a properties file with .job extension.
This job file defines the type of job to be run, the dependencies and any
parameters needed to set up your job correctly.
• # foo.job
• type=command
• command=echo "Hello World"
13. AJAX API
Azkaban has some exposed ajax calls accessible through curl or some
other HTTP request clients.
This API helps authenticate a user and provides a session.id in response.
Once a session.id has been returned, until the session expires, this id can
be used to do any API requests with a proper permission granted.
14. API Calls
With this Session.id , we can:-
– Create a Project
– Delete a Project
– Upload a Project Zip
– Fetch Flows of a Project
– Fetch Jobs of a Flow
– Fetch Executions of a Flow
– Fetch Running Executions of a Flow
– Cancel a Flow Execution
– Schedule a Flow
23. Job Summary
The Job Summary tab contains a summary of the information in the job logs.
This includes:
Job Type - the jobtype of the job
Command Summary - the command that launched the job
process, with fields such as the classpath and memory settings
shown separately as well
Pig/Hive Job Summary - custom stats specific to Pig and Hive
jobs
Map Reduce Jobs - a list of job ids of Map-Reduce jobs that were
launched, linked to their job tracker pages