SlideShare a Scribd company logo
1 of 23
Magic Scaling Sprinkles
Experience operating a high-volume, low-
latency ad buying platform at Tapad
@tobym
@TapadEng
Who am I?
Toby Matejovsky
First engineer hired at Tapad 3+ years
ago
Scala developer
@tobym
What are we talking about?
One of the key components that allows
Tapad’s realtime ad-buying platform to hit
350,000 TPS.
Outline
• What Tapad does
• Why Aerospike is a good fit
• Operational experience
• What’s next
What Tapad Does
(Real-time bidding)
The Tapad Difference.
A Unified View.
Ad exchange
Want to show an ad to device
123?
Tapad
Sure, show this ad for $2 CPM
No thanks
Great, you won. Ad was displayed!
How about to device XYZ?
95% response time: ~30 ms
Why Aerospike?
Fast
Safe
Scale out
Expiration/eviction
Super fast key-value store
350,000 reads per second
on 7 nodes
99% of reads are under 1 millisecond
Safe
Replication factor
XDR (cross datacenter replication)
SSD-backed
Scale out
Linear scalability, just add a node*
*will revisit this during the next section
Expiration and eviction
Old data expires automatically
Oldest data is evicted if the database is running
out of space
This is desired behavior in ad-tech world
Operational experience with Aerospike at Tapad
Configuration
Migrations
Eviction
Usage
Tapad’s Aerospike Configuration
100% keys in memory
100% data in SSD storage
Replication factor 2
512-byte block size
Need lots of free space in memory and storage for defrag (high-
water mark)
Migrations and partitions
New node requires data migration, means degraded
performance
Network partition may trigger some data migration
Eviction
Awesome feature, not intuitive if objects’ TTLs are not
nicely distributed
Usage
Blocking and non-blocking clients available
LZ4-compressed protobuf
Hot key error
What’s next?
Smaller minimum block size
Replace Redis (UDFs)
Multiple keys to reference the same record
Thank You
@tobym
@TapadEng
Toby Matejovsky, Director of Engineering
toby@tapad.com
@tobym

More Related Content

Similar to Aerospike at Tapad

AWS Cloud Kata | Bangkok - Getting to Profitability
AWS Cloud Kata | Bangkok - Getting to ProfitabilityAWS Cloud Kata | Bangkok - Getting to Profitability
AWS Cloud Kata | Bangkok - Getting to Profitability
Amazon Web Services
 
Moovd Quick Presentation
Moovd Quick PresentationMoovd Quick Presentation
Moovd Quick Presentation
Moovd
 

Similar to Aerospike at Tapad (20)

Low latency microservices in java QCon New York 2016
Low latency microservices in java   QCon New York 2016Low latency microservices in java   QCon New York 2016
Low latency microservices in java QCon New York 2016
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up business
 
Tectonic Shift: A New Foundation for Data Driven Business
Tectonic Shift: A New Foundation for Data Driven BusinessTectonic Shift: A New Foundation for Data Driven Business
Tectonic Shift: A New Foundation for Data Driven Business
 
The Power of Amazon EC2 Spot Instances Best Practices and Real-World Use Cases
The Power of Amazon EC2 Spot Instances Best Practices and Real-World Use CasesThe Power of Amazon EC2 Spot Instances Best Practices and Real-World Use Cases
The Power of Amazon EC2 Spot Instances Best Practices and Real-World Use Cases
 
Why all software teams move towards zero innovation speed - And what to do ab...
Why all software teams move towards zero innovation speed - And what to do ab...Why all software teams move towards zero innovation speed - And what to do ab...
Why all software teams move towards zero innovation speed - And what to do ab...
 
The Road to Amazon and Beyond
The Road to Amazon and BeyondThe Road to Amazon and Beyond
The Road to Amazon and Beyond
 
Dean Sheehan [InfluxData] | InfluxDB Time Series Engine Overview | InfluxDays...
Dean Sheehan [InfluxData] | InfluxDB Time Series Engine Overview | InfluxDays...Dean Sheehan [InfluxData] | InfluxDB Time Series Engine Overview | InfluxDays...
Dean Sheehan [InfluxData] | InfluxDB Time Series Engine Overview | InfluxDays...
 
What is aerospike database and why is it vastly superior to other database an...
What is aerospike database and why is it vastly superior to other database an...What is aerospike database and why is it vastly superior to other database an...
What is aerospike database and why is it vastly superior to other database an...
 
AWS Cloud Kata | Bangkok - Getting to Profitability
AWS Cloud Kata | Bangkok - Getting to ProfitabilityAWS Cloud Kata | Bangkok - Getting to Profitability
AWS Cloud Kata | Bangkok - Getting to Profitability
 
Serverless @ Haufe.Group presented at AWS Summit Berlin 2018
Serverless @ Haufe.Group presented at AWS Summit Berlin 2018Serverless @ Haufe.Group presented at AWS Summit Berlin 2018
Serverless @ Haufe.Group presented at AWS Summit Berlin 2018
 
Moovd Quick Presentation
Moovd Quick PresentationMoovd Quick Presentation
Moovd Quick Presentation
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
 
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleHow Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
High availability, real-time and scalable architectures
High availability, real-time and scalable architecturesHigh availability, real-time and scalable architectures
High availability, real-time and scalable architectures
 
DEM09 [Repeat] Fearless: From Monolith to Serverless with Dynatrace
DEM09 [Repeat] Fearless: From Monolith to Serverless with DynatraceDEM09 [Repeat] Fearless: From Monolith to Serverless with Dynatrace
DEM09 [Repeat] Fearless: From Monolith to Serverless with Dynatrace
 
DEM04 Fearless: From Monolith to Serverless with Dynatrace
DEM04 Fearless: From Monolith to Serverless with DynatraceDEM04 Fearless: From Monolith to Serverless with Dynatrace
DEM04 Fearless: From Monolith to Serverless with Dynatrace
 
Datetimemanager
DatetimemanagerDatetimemanager
Datetimemanager
 
ITAM UK 2017 Vendor negotiations in a cloudy world_Kylie Fowler
ITAM UK 2017 Vendor negotiations in a cloudy world_Kylie FowlerITAM UK 2017 Vendor negotiations in a cloudy world_Kylie Fowler
ITAM UK 2017 Vendor negotiations in a cloudy world_Kylie Fowler
 
AWS re:Invent 2016: Learn How FINRA Aligns Billions of Time Ordered Events wi...
AWS re:Invent 2016: Learn How FINRA Aligns Billions of Time Ordered Events wi...AWS re:Invent 2016: Learn How FINRA Aligns Billions of Time Ordered Events wi...
AWS re:Invent 2016: Learn How FINRA Aligns Billions of Time Ordered Events wi...
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Aerospike at Tapad

Editor's Notes

  1. Didn’t realize this had animation, which is why this is out of order. But, it actually shows the asynchronous nature of the system pretty well!
  2. This number is bid-requests per second. A given request may trigger multiple reads because of following aliases.
  3. Marketers understand delivery and response for a single channel However, marketers don’t have data about consumer exposure and marketing impact across devices
  4. Marketers understand delivery and response for a single channel However, marketers don’t have data about consumer exposure and marketing impact across devices
  5. Replication factor 2. If a node goes down, the data is still there. XDR - our west coast bidders read from west coast aerospike, and vice versa. Changes propagate in milliseconds.
  6. Read and write performance scales linearly. Much easier with a key-value store than a relational database, but hey! it fits our data model perfectly.
  7. This is desired behavior because new information is more actionable than old information. Month old record with no recent activity may never be accessible (cleared cookies). Chuck it, make room for new data. Expiring old data also makes our ETL process faster because the old data is gone so we don’t need to export it. Evicting older data makes space for good data when we enter a very high-write situation. Lets the system continue operating.
  8. Config - what Tapad’s deployment looks like. We don’t have as many disks as the setups from Aerospike’s benchmarks because we don’t actually need that much storage. Migrations - what happens when you add a node to the cluster? Eviction - it’s a great feature for us, but was counter-intuitive for us; we had an interesting scenario with this. more later Usage - as a software developer, what is it like using aerospike?
  9. Multiple terabytes of data in 3.3 billion objects Each object is pretty small, about 200 bytes. More on this later in relation to block size, which is 512 bytes
  10. Recommended high-water mark is 50% memory and 50% storage. Aerospike needs this to defray the data. We set it aggressively to 70% on memory and storage to save money on hardware. It’s possible to send updates at a rate faster than the defragger can keep up, leading to out­-of-­disk­-space issues even when you have 40% of the disk free. oops. Solution was to set the high­water mark lower (to 50%), which evicted a bunch of older data and got us back in business.
  11. Most of our records are above 128 bytes and below 256 bytes. Block size is 512 bytes which means we are often wasting 50% of the record’s storage space. Block size can be set to 128 bytes in versions 2.7 and 3.1. Looking forward to deploying this.
  12. Adding a new node could mean a day (24 hours) of degraded performance. This is tunable based on speed of migration. Faster migration = cluster is slower bc many writes; alternative is slow migration. New nodes should be homogenous; a new node with more storage than the other nodes cannot make use of it until all nodes have as much storage
  13. Go to next slide for the picture of the buckets.
  14. Eviction takes place by splitting records into buckets based on TTL, and evicting randomly from lowest TTL (soonest to expire) If records are not evenly distributed, chaos ensues. Appears that data is being evicted randomly. Solution was to change the long TTL to a short one, and refresh those devices with a regularly scheduled job.
  15. hot key = 3 outstanding reads on the same key; can’t get away from hot spots in data. Error is sent from the client.
  16. Multiple keys pointing to same record could save us a lot of extra reads (250k bid requests vs 350k reads - a lot of that is following id aliases)