Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scheduling big data workloads on serverless infrastructure

Ilai Malka from Nielsen at AWS Community Day TLV, December 2019 (https://awscommunitydaytelaviv2019.splashthat.com/):
Scheduling big data workloads is challenging. It's extra challenging when running on Serverless infrastructure.
At Nielsen Marketing Cloud, we've built a system that uploads 250 billion events per day to partner ad platforms, running on Serverless infrastructure (AWS Lambda and OpenFaaS).

Creating a 'scheduler' for this system required:
1. Rate-limiting to prevent flooding partner platforms.
2. High utilization to keep costs low
3. Careful bottleneck management to keep the system humming

https://www.linkedin.com/in/ilai-malka-93b06172/
https://twitter.com/IlaiMalka

#Nielsen #NielsenMarketingCloud #AWSCommunityDay #Serverless

  • Login to see the comments

  • Be the first to like this

Scheduling big data workloads on serverless infrastructure

  1. 1. Scheduling big data workloads on Serverless infrastructure
  2. 2. What You’ll Hear About • Our data pipeline • Why Serverless ? • Problems we had to solve - Cost - Rate Limiting
  3. 3. About me My Post about Serverless: https://medium.com/nmc-techblog/going-serverless-c334ae242ca6 NMC Tech Blog: https://medium.com/nmc-techblog Ilai Malka Big Data Developer
  4. 4. Segmentation Upload To Networks Run Campaigns About Nielsen Marketing Cloud (NMC) 10 Billion Profiles 140 Ad Networks 999+ Campaigns
  5. 5. Segmentation Upload To Networks Run Campaigns About Nielsen Marketing Cloud (NMC) 10 Billion Profiles 140 Ad Networks 9999 Campaigns 250 Billion Events
  6. 6. 250 Billion Events/Day Scale up/down Cost Effectiveness The Challenges
  7. 7. Serveless (AWS Lambda )
  8. 8. 250 Billion Events/Day 0.3 - 1 TB/hour Quickly Scales Up and Down Top Day Ever 750 Billion Events/Day
  9. 9. ~ $1000 per Day ~ $4 Per Billion Events Total cost: $350k /year Lambda 74% S3 13% DB 11% Other 2%
  10. 10. Cost Tips
  11. 11. Warning: Cost can skyrocket Incentive to optimize Cost is linear to computation power
  12. 12. $$$$$ == memory * duration memory duration How does Lambda pricing work?
  13. 13. Our next plan: a Hybrid solution
  14. 14. Rate Limiting Tips
  15. 15. Ad Networks Ad networks are not totally scalable
  16. 16. Bottleneck Management
  17. 17. Problems: 1. Can’t support sub-minute windows 2. Manager is a bottleneck
  18. 18. 11:00:00 - 11:00:59 250Mb 50MB 100Mb
  19. 19. 11:00:00 - 11:00:59 11:01:00 - 11:01:59 1 minutes window 50MB 250Mb 250Mb
  20. 20. 43% utilization 28% utilization 11:00:00 - 11:00:59 11:01:00 - 11:01:59
  21. 21. Key takeaways • Serverless is the next revolution • Serverless has a built in scalability feature + shorter time to market • Cost is linear to computation power • Incentive to optimize. optimize=costs saving • Cost formula is not straightforward -> Find right memory setting with tool • Costs can get out of control -> add alerts • Hybrid solution = scalability + low costs • Rest of the world don’t use serverless so we need to avoid flooding them • Find the right bottleneck and solve it
  22. 22. Questions?

×