ATC Communication System

Air Traffic Controller
Using Samza to manage communications with members
By: Cameron Lee and Shubhanshu Nagar

Outline
Problem Statement
How ATC Solves it
Implementation
Interesting Features

What problem are we trying to solve?
In the past, LinkedIn provided a poor communications experience to some of
its members.
Too much email, low quality email, fired on multiple channels at once
Our goal was to build a system which could apply some common
functionality across many different communication types and use cases in
order to improve the member experience.
Handle thousands of communications per second
Good understanding of state of members on the site in near-real-time

How does ATC think about
creating
a delightful member experience?

5 Rights
Right member
Right message
Useful to member
Shouldn’t have seen it before
Right frequency
Right channel

Filtering
Don’t send stale messages
Don’t send spammy messages
Don’t send duplicate messages

Aggregation and Capping
Don’t flood me. Consolidate if you have too much to say.

Channel Selection
“Don’t blast all channels at the same time”

Delivery-time Optimization
● Hold on to a message and deliver it at the right
moment.
● Ex: Don’t buzz my phone at 2 AM.
● I like to read my daily digests every day after work.

Requirements for ATC
● Highly-scalable
● Nearline (but close to real-time!)
● Ingest data from many sources
● Persist some data, but most needs are low TTL

Ecosystem
Message
Delivery Service
Offline
apps
Online apps
ATCRelevance
scores
User action
data

Persistence: RocksDB
Out-of-the-box storage layer
Write-optimized for high performance on SSDs.
Changelogs provide fault tolerance and bootstrapping capabilities

ATC
Pipeline
instance 1
ATC
Repartitioner
Re-partitioning of events
External
services
ATC
Pipeline
instance n

ATC
task
External
Requests
Channel
Selection
Message Delivery
Service
Scheduler
Filtering
Message Data
Tree
Generation
Aggregation
& Capping
Hipster Stream Processing

Streaming Technologies
Kafka: publish-subscribe messaging system
Used to send input to ATC to trigger communications
Many actions and signals in the LinkedIn ecosystem are tracked in kafka events. We can
consume these signals to better understand the state of the ecosystem.
Databus: change capture system for databases
Produces an event whenever an entry in a database changes

Host affinity
By default, whenever a Samza app is deployed, the task instances can be
moved to any host in the cluster, regardless of where the instances were
previously deployed.
If there was any state saved (e.g. RocksDB), then the new instances would
have to rebuild that state off of the changelog. This bootstrapping can take
some time depending on the amount of data to reload. Task instances
can’t process new input until bootstrapping is complete.
We have some use cases which can’t be delayed for the amount of time it

Host affinity (continued)
Host affinity is a Samza feature which allows us to deploy task instances
back to the same hosts from the previous deployment, so state does not
need to be reloaded.
In case of failures for individual instances, Samza can fallback to moving the
instance elsewhere and bootstrapping off of the changelog.

Multiple datacenters
Samza does not currently support replicating persistent application state
(e.g. RocksDB) across multiple clusters which are running the same app.
We need ATC to run in multiple datacenters for redundancy.
We need to have state in each datacenter so that if we have to move
processing between datacenters, then we can continue to properly handle
input.

Multiple datacenters
We rely on the input streams to replicate the main input so that we can do
processing and build up state in all datacenters.
The side effects (trigger the actual email send) then will only get emitted by
one of the datacenters. We can dynamically choose where side effects are
triggered.

Multiple datacenters (continued)

Deployments
When we deploy changes to ATC, we can deploy to a single datacenter at a
time in order to test new versions on only a fraction of traffic.
In some cases, we shift all side effects out of a datacenter to do an upgrade.
Since we still process all input, we can validate almost all of our
functionality and ensure performance doesn’t take an unexpected hit.

Store migrations
In some cases, we need to migrate our system to use a new instance of a
store.
For example, when support was added to use RocksDB TTL, we needed to migrate some of
our stores.
Since we only needed the last X days of data, we could use the following
strategy for the migration:
Write to both the old and new store for X days, but continue to read from the old store.
After X days, read from the new store, but continue writing both stores so we could fall back

Personalization through relevance
We work closely with a relevance team in order to make better decisions
about the communications we send out.
e.g. channel selection, delivery time, aggregation thresholds
Every day, scores for different decisions are computed offline (Hadoop) by the
relevance team. Those scores are pushed to ATC through Kafka, and then
ATC stores the scores in RocksDB.
Scores are generated for each member, so we can personalize the
experience.

Remote calls
Some data is not available on a Kafka stream in a pragmatic way
We make REST requests to fetch that data
Done at the beginning of pipeline
Extract event
Make remote calls and decorate event
Process decorated event

Remote calls - Efficiently
Use ParSeq
Framework to write asynchronous code in Java
Open Sourced
ParSeq uses a thread pool for making remote calls
Rest of processing happens serially
Checkpointing handled by application

Real-time Processing
Some messages require real-time latency
Tuned Kafka’s batching configuration to achieve sub-second of pre-ATC
latency
Can be tuned even more aggressively!
ATC/Samza processes most events in 2-3 ms
No remote calls for these messages

Scheduler
Scheduler RocksDB
Scheduled requests
(from aggregation,
follow-up, etc.)
Window task
(periodic)
Other
processing
Message
Delivery Service

ATC Communication System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ATC Communication System

Similar to ATC Communication System (20)

Recently uploaded

Recently uploaded (20)

ATC Communication System