3. Azkaban feature
• Simple Job Management Tool
– Define job dependency
– Retry
– Scheduling
– Web UI
• See dependency/execution time/log
• Store log to db as blob
– SPOF
– Not register holiday
– Not triggered by file creation event
• Mail notification only
– HTTP Job Callback
• No binary
– Need to build source
• Not so active development
• Mailing List doesn’t function very well
15. My use case
• Use Azkaban to manage hadoop job
– Write batch in python
• Use Azkaban API
– I created client https://github.com/wyukawa/eboshi
– Commit scheduling information to GHE
• Painful to write job file
– I created generation tool
https://github.com/wyukawa/ayd
– generate 1 flow from 1 yaml file
20. My usage situation
• More than 120 Azkaban flows
• Many daily batches, a few hourly, weekly, monthly batches
• Most flows are related to hive
• There is the Azkaban in batch server
• I prepare the template Azkaban flows to reaggregate past
data
– Set job name and date to parameter
– Set Run Concurrently
• I don’t use SLA but I may use in the future
– https://github.com/azkaban/azkaban/pull/911
• I don’t use HTTP Job Callback
– use hipchat in python ETL
21. My feeling
• Simple
• Easy to use
• Web UI is convenient
• API is useful
• There is no reason to replace Azkaban
• I hope development become active