Scheduling big data workloads on serverless infrastructure
Ilai Malka from Nielsen at AWS Community Day TLV, December 2019 (https://awscommunitydaytelaviv2019.splashthat.com/):
Scheduling big data workloads is challenging. It's extra challenging when running on Serverless infrastructure.
At Nielsen Marketing Cloud, we've built a system that uploads 250 billion events per day to partner ad platforms, running on Serverless infrastructure (AWS Lambda and OpenFaaS).
Creating a 'scheduler' for this system required:
1. Rate-limiting to prevent flooding partner platforms.
2. High utilization to keep costs low
3. Careful bottleneck management to keep the system humming
• Serverless is the next revolution
• Serverless has a built in scalability feature + shorter time to market
• Cost is linear to computation power
• Incentive to optimize. optimize=costs saving
• Cost formula is not straightforward -> Find right memory setting with tool
• Costs can get out of control -> add alerts
• Hybrid solution = scalability + low costs
• Rest of the world don’t use serverless so we need to avoid flooding them
• Find the right bottleneck and solve it