4. * Companies grow to have a complex network of
processes that have intricate dependencies
* Analytics & batch processing are mission
critical
* Tons of time is spent writing jobs, monitoring
and troubleshooting issues
The Problem Statement
5. * Data lineage is opaque
* Learning curve gets steeper as the ecosystem
grows
* Code / logic is duplicated (not OO)
* Open ended framework = heterogenous methods
* Ownership and accountability falls behind
* Maintenance time grows exponentially
* Operational metadata (job duration, landing times,
logs, …) is scattered if existent
* Loose guidelines leads to bad practices
* Signal to noise ratio on alerts gets out of control
Common Symptoms
6. * Airbnb was outgrowing its workflow
methodology
* High hopes:
* Dynamic pipeline generation
* Programmatic environment
* A platform people love working with
* We took it as an opportunity to build and share
Genesis
26. In the works / next steps
* Build a community!
* class YarnExecutor(BaseExecutor):
* Sharing our services / frameworks
* Hive stats collection
* Anomaly detection
* Experimentation framework