[@IndeedEng] How to Get a Job 35 Million Times a Day Using RabbitMQ

How to Get a Job 35 Million Times a Day
Using RabbitMQ
Ketan Gangatirkar and Cameron Davison

Aggregation gets jobs so
Jobseekers get jobs

Aggregation != Spidering
Spiders see pages.
Aggregation sees jobs.

How spiders see job sites
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page

How Indeed sees job sites
Start
Job List
Job Job Job
Job List
Job Job Job
Job List
Job Job Job
Navigation Navigation
Job
Job
Job

Aggregation != Spidering
Job sites have structure
Job pages have semantics
Navigation is more than following links

{
Url: http://www.applytracking.com/track.aspx/3VYzR
Title: Senior Erlang Engineer
Company: Machine Zone
Location: Palo Alto,CA,US, 94301
Source Type: Employer
Job Type: Full-time
...
Description: The Senior Erlang Engineer is an integral ...
...
Createdate: 2013-02-05 23:18:05
...
}
What's in a job

location
description
Company
Title

Title
salary
location
job type
description
Company

How we build products
simple
fast
comprehensive
relevant

Simple
Tough problems, simple solutions

Fast
Discover the jobs quickly
Get them to jobseekers in minutes

10% of jobseekers sort by date

20% of jobseekers want only new jobs

Relevant
Semantic extraction
The job is still available
Ignore non-jobs

This is a hard problem
Flaky sites
Site redesigns
Javascript
Missing or bad information

Big N makes it even harder
Examine 38M jobs every day

Do this in minutes
Search
100M
Jobseekers
Aggregation
Employers
Job Boards
Staffing firms
Recruiters

Strawman* architecture
Datacenter B
MySQL
Engine
Datacenter A
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Primary
Datacenter

N connections
MySQL
Job siteJob siteJob siteJob siteJob siteJob site
Primary
Datacenter
EngineEngineEngineEngineEngineEngine
Datacenter BDatacenter A

N concurrent writers
MySQL
Primary
Datacenter

High latency
MySQL
Primary
Datacenter

Limitation: failure points
Datacenter B
MySQL
Engine
Datacenter A
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Primary
Datacenter
X
X

Scaling Patterns
What has worked for us so far?

Service-Oriented Architecture
Engine
Engine
Engine
Job Write
Service MySQL
Remote
Datacenter
Primary
Datacenter
see http://go.indeed.com/boxcar

Standard Service Interaction
Client Service Database

Our Interaction
Client Service Database

Does this do what we need?
● Lots of workers...
● Sending lots of results...
● Over a long distance...
● That need to get processed fast...
● Reliably?

Engine Failure
Engine
Engine
Engine
Job Write
Service MySQL
Remote
Datacenter
X
Primary
Datacenter

Engine failure fix:
Buffer to disk
Engine
Engine
Engine
Job Write
Service MySQL
Remote
Datacenter
disk
disk
disk
Primary
Datacenter
X

Network Failure
Engine
Engine
Engine
Job Write
Service MySQL
Remote
Datacenter
X
Primary
Datacenter

Network failure fix:
Disks solve that too
Engine
Engine
Engine
Job Write
Service MySQL
Remote
Datacenter
disk
disk
disk
X
Primary
Datacenter

Write Service Failure
Job Write
Service MySQL
Remote
Datacenter
X
Engine
Engine
Engine
Primary
Datacenter

Write Service Failure fix:
Disks solve that too
Job Write
Service MySQL
Remote
Datacenter
X
Engine
Engine
Engine
Primary
Datacenter
disk
disk
disk

Write Service Failure fix:
Redundancy
Job Write
Service
MySQL
Remote
Datacenter
Primary
Datacenter
X
Engine
Engine
Engine
Job Write
Service
Job Write
Service

Database Failure
Job Write
Service MySQL
Remote
Datacenter
X
Engine
Engine
Engine
Primary
Datacenter

Database Failure fix:
Buffer to disk
Job Write
Service
MySQL
Remote
Datacenter
X
Engine
Engine
Engine
disk
Primary
Datacenter

Our new architecture
Job Write
Service
MySQL
Remote
Datacenter
Primary
Datacenter
Engine
Engine
Engine
disk
disk
disk
Job Write
Service
Job Write
Service
disk
disk
disk

We could build this...
Job Write
Service
MySQL
Remote
Datacenter
Primary
Datacenter
Engine
Engine
Engine
disk
disk
disk
Job Write
Service
Job Write
Service
disk
disk
disk

... maybe someone already has
Job Write
Service
MySQL
Remote
Datacenter
Primary
Datacenter
Engine
Engine
Engine
disk
disk
disk
Job Write
Service
Job Write
Service
disk
disk
disk

Aggregation Requirements
● Durable
● Multi-Data Center (latency)
● 38 million jobs a day
● 2KB average job size
○ 76 GB a day
● Target peaks of 1000 jobs / second
● Programming language agnostic

What we found
High Availability
Open Source/Free
Self-hosted
Performant

Advanced Message Queuing
Protocol (AMQP)
● Open Standard
● Wire protocol
● Existing Clients in Multiple Languages

Concepts
● Confirmation and Ack
● At least once
● Asynchronous Confirms
● Persistent
● Clustering

Confirmation and Ack
MQ
Producer Consumer
m
sg
confirm
ack
m
sg
1
2 3
4

At least once
MQ
At most once
Consumer
Message
Ack
MQ
Consumer
Message
Auto Ack

Asynchronous Confirms
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Producer
m
essages
confirm #6

Persistent
MQ
Producer Consumer

Persistent
MQ
Producer Consumer
X

Clustering
SlaveMaster
Producer
1
2
3
4

Test RabbitMQ
● Send millions of 2KB messages
● 20 producers and 20 consumers
● 1000 messages / second
● Simulate multiple failures

Test Consistency
Producers
Rabbit
MQ
Rabbit
MQ
Consumers
Slave
Master

Test Consistency
Producers
Rabbit
MQ
Rabbit
MQ
Consumers
Master
Slave

Test Consistency
Producers
Rabbit
MQ
Rabbit
MQ
Consumers
X
Master

RabbitMQ Clustering
Master Slave

RabbitMQ Clustering
MasterSlave

Non-persistent
15990 Messages / Second
30 MB/s

Persistent
2781 Message / Second
5.5 MB/s

Clustered and Persistent
2.5 MB/s

Unreliable High Latency
Connections
Engine
Engine
Engine
Job Write
Service
Remote DC Primary DC
MySQL

Replaced with RabbitMQ
Engine
Engine
Engine
Job Write
Service
Rabbit
MQ
MySQL

Engine
Engine
Engine
Job Write
Service
Rabbit
MQ

Rabbit can talk to Rabbit
Shovel Plugin
Producer RabbitMQ 1 ConsumerRabbitMQ 2

Engine
Engine
Engine
Job Write
Service
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ

Engine
Engine
Engine
Job Write
Service
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Primary DC
Rabbit
MQ
Remote DC

Parallelize
Job Write Service
RabbitMQ
Job Write
Service
Job Write
Service
Job Write
Service
Job A
Job B
Job C

Engine
Engine
Engine
Job Write
Service
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Primary DC
Rabbit
MQ
Job Write
Service
Remote DC

Engine
Engine
Engine
Job Write
Service
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Primary DC
Rabbit
MQ
Job Write
Service

Message Flow
Engine
Engine
Engine
Job Write
Service
Primary DC
Job Write
Service
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ

Jobs/minute from one site
220,000 jobs
6 hours
611 jobs / minute

Jobs/minute from one site
251,000 jobs
20 minutes
12550 jobs / minute

Rabbit
MQ
Horizontal Scale
Engine
Engine
Engine Job Write
ServiceRabbit
MQ
Job Write
Service
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Rabbit
MQ
Job Write
Service
Job Write
Service

RabbitMQ 3
5MB/s

RabbitMQ Configuration
● Confirmations - Fire and Forget
● Persistent Messages - Durable
● Shoveling - Multi-Data Center
● Mirrored Queues in Cluster - High Reliability

Aggregation Viewer
Real-time browser-based view of job stream

● Almost real-time
● Exclusive queue
● Transient messages
Aggregation Viewer Architecture
Agg Jobs
Rabbit MQ
Cluster
Agg Viewer
Rabbit MQ
Agg
Viewer
Shovel* SubscribeJobs HTTP Browser

Resume Contacts Billing
Pay-per-contact: limited budget

Original Path
P
a
c
i
f
i
c
Asia DC US DC
Log repoResume Search
MySQL
see http://go.indeed.com/logrepo

Fast Path
P
a
c
i
f
i
c
Asia DC US DC
Rabbit
MQ
MySQL
Log repo
Rabbit
MQ
Resume Search
X

Company Page Edits
User-contributed content about companies

Company Page Edits
Implementation
Writing data AND reading it back

Company Page Edits
Single Datacenter
Browser
Web Server
MySQL

Company Page Serving
Browser
Web Server
LSM Tree
Asia Datacenter
Memcached
see http://go.indeed.com/lsmtree

P
a
c
i
f
i
c
Company Page Edits
Browser
Web Server
RabbitMQ RabbitMQ MySQL
Primary US
Datacenter
Asia Datacenter EU Datacenter
A
t
l
a
n
t
i
c
[Et cetera]
Memcached

P
a
c
i
f
i
c
Company Page Reads
MySQL
LSM Tree
Builder
LSM Tree
Primary US
Datacenter
Asia Datacenter
LSM Tree
EU Datacenter
A
t
l
a
n
t
i
c
[Et cetera]

Memcached
P
a
c
i
f
i
c
Company Pages System
Browser
Web Server
RabbitMQ RabbitMQ MySQL
LSM Tree
Builder
LSM Tree
Primary US
Datacenter
Asia Datacenter
LSM Tree
EU Datacenter
A
t
l
a
n
t
i
c
[Et cetera]

Recap: The jobs must flow
● Durability
● High throughput
● Low latency
● Partition-tolerance
● Efficient use of the database
● Minimal points of failure

[@IndeedEng] How to Get a Job 35 Million Times a Day Using RabbitMQ

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

More from indeedeng

More from indeedeng (12)

Recently uploaded

Recently uploaded (20)

[@IndeedEng] How to Get a Job 35 Million Times a Day Using RabbitMQ