@IndeedEnd March: Wednesday, March 27th
Video available: http://www.youtube.com/watch?v=MeRHetCMiHg
The goal of Indeed's aggregation engine is to find and retrieve every job in the world, as quickly and accurately as possible. As we described in our previous tech talk, we strive to build products that are simple, fast, comprehensive, and relevant. The world's most comprehensive job search site is fueled by the more than 35 million job postings we process every day, which we deliver to jobseekers within minutes of discovery.
Our original aggregation architecture was implemented using standard patterns. Our growth required levels of scalability, performance, and resilience this architecture simply could not handle. In a case study of scaling for the web, we will discuss how we tackled this problem. We will cover the issues we saw with our original architecture, how we analyzed our options to guide a solution, how we used RabbitMQ as a key component in the new architecture, and benchmarks to evaluate how successful we were.
Speaker Ketan Gangatirkar is the development manager responsible for Indeed's continuous deployment infrastructure as well as its aggregation system.
Speaker Cameron Davison is a software engineer on the aggregation team at Indeed and a graduate of UT Austin. He re-architected Indeed's aggregation pipeline using RabbitMQ to sustain high write volumes, and continues to improve products in the aggregation system to make it run more efficiently.
32. N connections
MySQL
Job siteJob siteJob siteJob siteJob siteJob site
Primary
Datacenter
EngineEngineEngineEngineEngineEngine
Datacenter BDatacenter A
33. N concurrent writers
MySQL
Job siteJob siteJob siteJob siteJob siteJob site
Primary
Datacenter
EngineEngineEngineEngineEngineEngine
Datacenter BDatacenter A
34. High latency
MySQL
Job siteJob siteJob siteJob siteJob siteJob site
Primary
Datacenter
EngineEngineEngineEngineEngineEngine
Datacenter BDatacenter A
35. Limitation: failure points
Datacenter B
MySQL
Engine
Datacenter A
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Primary
Datacenter
X
X
40. Does this do what we need?
● Lots of workers...
● Sending lots of results...
● Over a long distance...
● That need to get processed fast...
● Reliably?
44. Network failure fix:
Disks solve that too
Engine
Engine
Engine
Job Write
Service MySQL
Remote
Datacenter
disk
disk
disk
X
Primary
Datacenter
45. Write Service Failure
Job Write
Service MySQL
Remote
Datacenter
X
Engine
Engine
Engine
Primary
Datacenter
46. Write Service Failure fix:
Disks solve that too
Job Write
Service MySQL
Remote
Datacenter
X
Engine
Engine
Engine
Primary
Datacenter
disk
disk
disk
47. Write Service Failure fix:
Redundancy
Job Write
Service
MySQL
Remote
Datacenter
Primary
Datacenter
X
Engine
Engine
Engine
Job Write
Service
Job Write
Service
49. Database Failure fix:
Buffer to disk
Job Write
Service
MySQL
Remote
Datacenter
X
Engine
Engine
Engine
disk
Primary
Datacenter
50. Our new architecture
Job Write
Service
MySQL
Remote
Datacenter
Primary
Datacenter
Engine
Engine
Engine
disk
disk
disk
Job Write
Service
Job Write
Service
disk
disk
disk
51. We could build this...
Job Write
Service
MySQL
Remote
Datacenter
Primary
Datacenter
Engine
Engine
Engine
disk
disk
disk
Job Write
Service
Job Write
Service
disk
disk
disk
52. ... maybe someone already has
Job Write
Service
MySQL
Remote
Datacenter
Primary
Datacenter
Engine
Engine
Engine
disk
disk
disk
Job Write
Service
Job Write
Service
disk
disk
disk
55. Aggregation Requirements
● Durable
● Multi-Data Center (latency)
● 38 million jobs a day
● 2KB average job size
○ 76 GB a day
● Target peaks of 1000 jobs / second
● Programming language agnostic
146. Recap: The jobs must flow
● Durability
● High throughput
● Low latency
● Partition-tolerance
● Efficient use of the database
● Minimal points of failure