SlideShare a Scribd company logo
1 of 25
Simplifying BigData Jobs
Arpit Tak
BigData Developer, Vizury
http://in.linkedin.com/in/arpittak/
It's not about
Ashley Stewart
EDSC 320
Final Project
http://www.kids-birthday-party-guide.com/harry-
potter-party.html
And the Prisoner of
Azkaban
What is Azkaban ?

Azkaban is a batch workflow job scheduler

It was created at LinkedIn to run Hadoop jobs

Azkaban resolves the ordering through job dependencies

It provides an easy to use web user interface to maintain and track your
workflows
Why Azkaban ?

Easy to use web UI

Retrying of failed jobs

Simple web and http workflow uploads

Workflow as a DAG (directed acyclic graph) made up of individual steps

Allow to run series of map-reduce, pig, java & scripts actions a single
workflow job.

Allow regular scheduling of workflow jobs

Detect Failure

SLA alerting and auto killing

Email alerts on failure and successes
Azkaban Overview
AzkabanWebServer
The web server uses the db for the following reasons:
Project Management - The projects, the permissions , uploaded files.
Executing Flow State - Keep track of executing flows and which Executor is
running them.
Previous Flow/Jobs - Search through previous executions of jobs and log
files.
Scheduler - Keeps the state of the scheduled jobs.
- Azkaban uses *.job key-value property files to define individual tasks in a
workflow, and the _dependencies_ property to define the dependency
chain of the jobs.
- These job files and associated code can be archived into a *.zip and
uploaded through the web server through the Azkaban UI or through curl.
AzkabanExecutorServer
The executor server uses the db for the following reasons:
Access the project - Retrieves project files from the db.
Executing Flows/Jobs - Retrieves and updates data for flows and that are
executing
Logs - Stores the output logs for jobs and flows into the db.
Creating Flows
A job is a process you want to run in Azkaban.
Jobs can be set up to be dependent on other jobs. The graph created by a set
of jobs and their dependencies are what make up a flow.
Creating Jobs:-
Creating a job is very easy. We create a properties file with .job extension.
This job file defines the type of job to be run, the dependencies and any
parameters needed to set up your job correctly.
• # foo.job
• type=command
• command=echo "Hello World"
Creating Flows
Embedded Flows
AJAX API

Azkaban has some exposed ajax calls accessible through curl or some
other HTTP request clients.

This API helps authenticate a user and provides a session.id in response.

Once a session.id has been returned, until the session expires, this id can
be used to do any API requests with a proper permission granted.
API Calls
With this Session.id , we can:-
– Create a Project
– Delete a Project
– Upload a Project Zip
– Fetch Flows of a Project
– Fetch Jobs of a Flow
– Fetch Executions of a Flow
– Fetch Running Executions of a Flow
– Cancel a Flow Execution
– Schedule a Flow
Notification Options
Failure Options
Concurrent Options
Flow Parameters
Job List
Job History Page
Schedule Flow
Job Page
Job Summary
The Job Summary tab contains a summary of the information in the job logs.
This includes:

Job Type - the jobtype of the job

Command Summary - the command that launched the job
process, with fields such as the classpath and memory settings
shown separately as well

Pig/Hive Job Summary - custom stats specific to Pig and Hive
jobs

Map Reduce Jobs - a list of job ids of Map-Reduce jobs that were
launched, linked to their job tracker pages
Job Logs
Any Questions ?

More Related Content

What's hot

E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyRikin Tanna
 
Apache Zeppelin & Cluster
Apache Zeppelin & ClusterApache Zeppelin & Cluster
Apache Zeppelin & ClusterJongyoul Lee
 
AWS Serverless solution for developers
AWS Serverless solution for developersAWS Serverless solution for developers
AWS Serverless solution for developersMichael Haberman
 
Spring Batch Performance Tuning
Spring Batch Performance TuningSpring Batch Performance Tuning
Spring Batch Performance TuningGunnar Hillert
 
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...Lightbend
 
Qui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in ActionQui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in ActionGlobalLogic Ukraine
 
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Thoughtworks
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks
 
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0Legacy Typesafe (now Lightbend)
 
Whirlpools in the Stream with Jayesh Lalwani
 Whirlpools in the Stream with Jayesh Lalwani Whirlpools in the Stream with Jayesh Lalwani
Whirlpools in the Stream with Jayesh LalwaniDatabricks
 
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudPakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudLightbend
 
Revitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive StreamsRevitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive StreamsLightbend
 
A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowDatabricks
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsMichael Kehoe
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)Simba Khadder
 
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and moreTypesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and moreLegacy Typesafe (now Lightbend)
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsPutting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsGareth Rogers
 
Kapil Thangavelu - Cloud Custodian
Kapil Thangavelu - Cloud CustodianKapil Thangavelu - Cloud Custodian
Kapil Thangavelu - Cloud CustodianServerlessConf
 
Quick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerQuick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerNic Raboy
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
 

What's hot (20)

E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/Livy
 
Apache Zeppelin & Cluster
Apache Zeppelin & ClusterApache Zeppelin & Cluster
Apache Zeppelin & Cluster
 
AWS Serverless solution for developers
AWS Serverless solution for developersAWS Serverless solution for developers
AWS Serverless solution for developers
 
Spring Batch Performance Tuning
Spring Batch Performance TuningSpring Batch Performance Tuning
Spring Batch Performance Tuning
 
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...
 
Qui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in ActionQui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in Action
 
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu Kasinathan
 
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
 
Whirlpools in the Stream with Jayesh Lalwani
 Whirlpools in the Stream with Jayesh Lalwani Whirlpools in the Stream with Jayesh Lalwani
Whirlpools in the Stream with Jayesh Lalwani
 
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudPakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
 
Revitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive StreamsRevitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive Streams
 
A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development Workflow
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
 
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and moreTypesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsPutting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech Analystics
 
Kapil Thangavelu - Cloud Custodian
Kapil Thangavelu - Cloud CustodianKapil Thangavelu - Cloud Custodian
Kapil Thangavelu - Cloud Custodian
 
Quick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerQuick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase Server
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
 

Viewers also liked

Hadoop at LinkedIn
Hadoop at LinkedInHadoop at LinkedIn
Hadoop at LinkedInKeith Dsouza
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...Barcoding, Inc.
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Spark Summit
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsPrabhu Thukkaram
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 

Viewers also liked (18)

Meetup talk
Meetup talkMeetup talk
Meetup talk
 
Hadoop at LinkedIn
Hadoop at LinkedInHadoop at LinkedIn
Hadoop at LinkedIn
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
 
Protecting Enterprise Data In Apache Hadoop
Protecting Enterprise Data In Apache HadoopProtecting Enterprise Data In Apache Hadoop
Protecting Enterprise Data In Apache Hadoop
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 

Similar to Lspe

Writing & Using Web Services
Writing & Using Web ServicesWriting & Using Web Services
Writing & Using Web ServicesRajarshi Guha
 
JavaFX Enterprise (JavaOne 2014)
JavaFX Enterprise (JavaOne 2014)JavaFX Enterprise (JavaOne 2014)
JavaFX Enterprise (JavaOne 2014)Hendrik Ebbers
 
Building API in the cloud using Azure Functions
Building API in the cloud using Azure FunctionsBuilding API in the cloud using Azure Functions
Building API in the cloud using Azure FunctionsAleksandar Bozinovski
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
Developing Java Web Applications
Developing Java Web ApplicationsDeveloping Java Web Applications
Developing Java Web Applicationshchen1
 
Building Scalable Applications with Laravel
Building Scalable Applications with LaravelBuilding Scalable Applications with Laravel
Building Scalable Applications with LaravelMuhammad Shakeel
 
Advance Java Topics (J2EE)
Advance Java Topics (J2EE)Advance Java Topics (J2EE)
Advance Java Topics (J2EE)slire
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionLecole Cole
 
Code First with Serverless Azure Functions
Code First with Serverless Azure FunctionsCode First with Serverless Azure Functions
Code First with Serverless Azure FunctionsJeremy Likness
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Editionecobold
 
Launching Services in Amazon Web Services
Launching Services in Amazon Web ServicesLaunching Services in Amazon Web Services
Launching Services in Amazon Web ServicesJames Armes
 
Ppt for Online music store
Ppt for Online music storePpt for Online music store
Ppt for Online music storeADEEBANADEEM
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...Amazon Web Services
 
Web app job and functions - TUGAIT 2017
Web app job and functions  - TUGAIT 2017Web app job and functions  - TUGAIT 2017
Web app job and functions - TUGAIT 2017Steef-Jan Wiggers
 

Similar to Lspe (20)

AJppt.pptx
AJppt.pptxAJppt.pptx
AJppt.pptx
 
Writing & Using Web Services
Writing & Using Web ServicesWriting & Using Web Services
Writing & Using Web Services
 
JavaFX Enterprise (JavaOne 2014)
JavaFX Enterprise (JavaOne 2014)JavaFX Enterprise (JavaOne 2014)
JavaFX Enterprise (JavaOne 2014)
 
Building API in the cloud using Azure Functions
Building API in the cloud using Azure FunctionsBuilding API in the cloud using Azure Functions
Building API in the cloud using Azure Functions
 
Airavata_Architecture_xsede16
Airavata_Architecture_xsede16Airavata_Architecture_xsede16
Airavata_Architecture_xsede16
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Developing Java Web Applications
Developing Java Web ApplicationsDeveloping Java Web Applications
Developing Java Web Applications
 
Building Scalable Applications with Laravel
Building Scalable Applications with LaravelBuilding Scalable Applications with Laravel
Building Scalable Applications with Laravel
 
Advance Java Topics (J2EE)
Advance Java Topics (J2EE)Advance Java Topics (J2EE)
Advance Java Topics (J2EE)
 
JDBC.ppt
JDBC.pptJDBC.ppt
JDBC.ppt
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
 
Code First with Serverless Azure Functions
Code First with Serverless Azure FunctionsCode First with Serverless Azure Functions
Code First with Serverless Azure Functions
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
 
Launching Services in Amazon Web Services
Launching Services in Amazon Web ServicesLaunching Services in Amazon Web Services
Launching Services in Amazon Web Services
 
Ppt for Online music store
Ppt for Online music storePpt for Online music store
Ppt for Online music store
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
 
Web app job and functions - TUGAIT 2017
Web app job and functions  - TUGAIT 2017Web app job and functions  - TUGAIT 2017
Web app job and functions - TUGAIT 2017
 
Java workflow engines
Java workflow enginesJava workflow engines
Java workflow engines
 

Recently uploaded

Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 

Recently uploaded (20)

Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 

Lspe

  • 1. Simplifying BigData Jobs Arpit Tak BigData Developer, Vizury http://in.linkedin.com/in/arpittak/
  • 2. It's not about Ashley Stewart EDSC 320 Final Project http://www.kids-birthday-party-guide.com/harry- potter-party.html And the Prisoner of Azkaban
  • 3. What is Azkaban ?  Azkaban is a batch workflow job scheduler  It was created at LinkedIn to run Hadoop jobs  Azkaban resolves the ordering through job dependencies  It provides an easy to use web user interface to maintain and track your workflows
  • 4. Why Azkaban ?  Easy to use web UI  Retrying of failed jobs  Simple web and http workflow uploads  Workflow as a DAG (directed acyclic graph) made up of individual steps  Allow to run series of map-reduce, pig, java & scripts actions a single workflow job.  Allow regular scheduling of workflow jobs  Detect Failure  SLA alerting and auto killing  Email alerts on failure and successes
  • 6. AzkabanWebServer The web server uses the db for the following reasons: Project Management - The projects, the permissions , uploaded files. Executing Flow State - Keep track of executing flows and which Executor is running them. Previous Flow/Jobs - Search through previous executions of jobs and log files. Scheduler - Keeps the state of the scheduled jobs. - Azkaban uses *.job key-value property files to define individual tasks in a workflow, and the _dependencies_ property to define the dependency chain of the jobs. - These job files and associated code can be archived into a *.zip and uploaded through the web server through the Azkaban UI or through curl.
  • 7. AzkabanExecutorServer The executor server uses the db for the following reasons: Access the project - Retrieves project files from the db. Executing Flows/Jobs - Retrieves and updates data for flows and that are executing Logs - Stores the output logs for jobs and flows into the db.
  • 8.
  • 9. Creating Flows A job is a process you want to run in Azkaban. Jobs can be set up to be dependent on other jobs. The graph created by a set of jobs and their dependencies are what make up a flow. Creating Jobs:- Creating a job is very easy. We create a properties file with .job extension. This job file defines the type of job to be run, the dependencies and any parameters needed to set up your job correctly. • # foo.job • type=command • command=echo "Hello World"
  • 12.
  • 13. AJAX API  Azkaban has some exposed ajax calls accessible through curl or some other HTTP request clients.  This API helps authenticate a user and provides a session.id in response.  Once a session.id has been returned, until the session expires, this id can be used to do any API requests with a proper permission granted.
  • 14. API Calls With this Session.id , we can:- – Create a Project – Delete a Project – Upload a Project Zip – Fetch Flows of a Project – Fetch Jobs of a Flow – Fetch Executions of a Flow – Fetch Running Executions of a Flow – Cancel a Flow Execution – Schedule a Flow
  • 23. Job Summary The Job Summary tab contains a summary of the information in the job logs. This includes:  Job Type - the jobtype of the job  Command Summary - the command that launched the job process, with fields such as the classpath and memory settings shown separately as well  Pig/Hive Job Summary - custom stats specific to Pig and Hive jobs  Map Reduce Jobs - a list of job ids of Map-Reduce jobs that were launched, linked to their job tracker pages