Maintaining the Front Door to Netflix

Maintaining the Front Door to Netflix
Ben Schmaus, @schmaus
APIcon SF - 2014

Global Streaming Video
for TV Shows and Movies

More than 48 Million Subscribers
More than 40 Countries

Netflix Accounts for >34% of Peak
Downstream Traffic in North America
Netflix subscribers are watching more than 1 billion hours a month

Netflix Accounts for >6% of Peak
Upstream Traffic in North America
Netflix subscribers are watching more than 1 billion hours a month

Team Focus:
Build the Best Global Streaming Product
Three aspects of the Streaming Product:
• Non-Member
• Discovery
• Streaming

Netflix API : Key Responsibilities
• Broker data between services and Devices
• Provide features and business logic
• Maintain a resilient front-door
• Scale the system
• Maintain high velocity

Data Gathering
Data Formatting
Data Delivery
Security
Authorization
Authentication
System Scaling
Discoverability
Data Consistency
Translations
Throttling
Orchestration
APIs Do
Lots of Things!
These are some of the
many things APIs do.

Data Gathering
Data Formatting
Data Delivery
Security
Authorization
Authentication
System Scaling
Discoverability
Data Consistency
Translations
Throttling
Orchestration
APIs Do
Lots of Things!
These three are at the core.
All others ultimately
support them.

Definitions
• Data Gathering
– Retrieving the requested data from one or many local
or remote data sources
• Data Formatting
– Preparing a structured payload to the requesting
agent
• Data Delivery
– Delivering the structured payload to the requesting
agent

Meanwhile…
There are two players in APIs

API Provider
PROVIDES
API Consumer
CONSUMES
Traditional API Interactions

API Provider
PROVIDES
EVERYTHING
API Consumer
CONSUMES
Everything means, API Provider does:
• Data Gathering
• Data Formatting
• Data Delivery
• (among other things)
Traditional API Interactions

Why do most API providers provide
everything?
• API design tends to be easier for teams closer
to the source
• Centralized API functions makes them easier
to support
• Many APIs have a large set of unknown and
external developers

Data Gathering Data Formatting Data Delivery
API Consumer
API Provider
Separation of Concerns
To be a better provider, the API should address the
separation of concerns of the three core functions

API Consumer
Don’t care how data
is gathered, as long
as it is gathered
API Provider
Care a lot about
how the data is
gathered

API Consumer
as it is gathered
Each consumer cares a
lot about the format
for that specific use
API Provider
Care a lot about
how the data is
gathered
Only cares about the
format to the extent it
is easy to support

API Consumer
as it is gathered
lot about the format
for that specific use
lot about how payload
is delivered
API Provider
Care a lot about
how the data is
gathered
Only cares about the
format to the extent it
is easy to support
Only cares about
delivery method to the
extent it is easy to
support

Should you consider alternatives to
one-size-fits-all API model?
Ingredients:
• Small number of targeted API consumers is top priority
• Close relationships between these API consumers and
the API team
• Increasing divergence of needs across the top priority
API consumers
• Strong desire by the API consumers for more optimized
interactions with the API
• High value proposition for the company providing the
API to make these API consumers as effective as
possible

• http://thenextweb.com/dd/2013/12/17/futur
e-api-design-orchestration-layer/

Because of our separation of
concerns, the Netflix API team is
enabled to focus on different charters

Brokering Data to
1,000+ Device Types

One-Size-Fits-All
API
Request
Request
Request

Courtesy of South Florida Classical Review

Resource-Based API
vs.
Experience-Based API

Resource-Based Requests
• /users/<id>/ratings/title
• /users/<id>/queues
• /users/<id>/queues/instant
• /users/<id>/recommendations
• /catalog/titles/movie
• /catalog/titles/series
• /catalog/people

REST API
RECOMME
NDATIONS
MOVIE
DATA
SIMILAR
MOVIES
AUTH
MEMBER
DATA
A/B
TESTS
START-
UP
RATINGS
Network Border Network Border

RECOMME
NDATIONS
MOVIE
DATA
SIMILAR
MOVIES
AUTH
MEMBER
DATA
A/B
TESTS
START-
UP
RATINGS
OSFA API
SERVER CODE
CLIENT CODE

RECOMME
NDATIONS
MOVIE
DATA
SIMILAR
MOVIES
AUTH
MEMBER
DATA
A/B
TESTS
START-
UP
RATINGS
OSFA API
DATA GATHERING,
FORMATTING,
AND DELIVERY
USER INTERFACE
RENDERING

Experience-Based Requests
• /ps3/homescreen

JAVA API
RECOMME
NDATIONS
MOVIE
DATA
SIMILAR
MOVIES
AUTH
MEMBER
DATA
A/B
TESTS
START-
UP
RATINGS
Groovy Layer

RECOMME
NDATIONSA
ZXSXX C
CCC
MOVIE
DATA
SIMILAR
MOVIES
AUTH
MEMBER
DATA
A/B
TESTS
START-
UP
RATINGS
JAVA API
SERVER CODE
CLIENT CODE
CLIENT ADAPTER CODE
(WRITTEN BY CLIENT TEAMS, DYNAMICALLY UPLOADED TO SERVER)

RECOMME
NDATIONSA
ZXSXX C
CCC
MOVIE
DATA
SIMILAR
MOVIES
AUTH
MEMBER
DATA
A/B
TESTS
START-
UP
RATINGS
JAVA API
DATA GATHERING
DATA FORMATTING
AND DELIVERY
USER INTERFACE
RENDERING

Because we are no longer
catering to LSUDs
We can now focus on building a
business on which all of Netflix
can operate

The bigger the ship…
the slower it turns

Personaliz
ation
Engine
User Info
Movie
Metadata
Movie
Ratings
Similar
Movies
Reviews
A/B Test
Engine
Dozens of Dependencies

Personaliz
ation
Engine
User Info
Movie
Metadata
Movie
Ratings
Similar
Movies
API
Reviews
A/B Test
Engine

2,000,000,000
Requests Per Day to the
Netflix API

30
Distinct Dependent
Services for the Netflix API

~600
Dependency jars Slurped
into the Netflix API

14,000,000,000
Netflix API Calls Per Day to
those Dependent Services

0
Dependent Services with
100% SLA

99.99% = 99.7%30
0.3% of 2B = 6M failures per day
2+ Hours of Downtime
Per Month

99.9% = 97%30
3% of 2B = 60M failures per day
20+ Hours of Downtime
Per Month

Call Volume and Health / Last 10 Seconds

Successful, But Slower Than Expected

Short-Circuited Requests, Delivering Fallbacks

Timeouts, Delivering Fallbacks

Thread Pool & Task Queue Full, Delivering Fallbacks

Exceptions, Delivering Fallbacks

Error Rate
# + # + # + # / (# + # + # + # + #) = Error Rate

Requests per Second, Over Last 10 Seconds

Personaliz
ation
Engine
User Info
Movie
Metadata
Movie
Ratings
Similar
Movies
API
Reviews
A/B Test
Engine
Fallback

Scaling the Distributed System

Scryer : Predictive Auto Scaling
Not yet…

Typical Traffic Patterns Over Five Days

Predicted RPS Compared to Actual RPS

Zuul
Gatekeeper for the Netflix Streaming Application

Zuul *
• Multi-Region
Resiliency
• Insights
• Stress Testing
• Canary Testing
• Dynamic Routing
• Load Shedding
• Security
• Static Response
Handling
• Authentication
* Most closely resembles an API proxy

All of these approaches are
designed to prevent failures…

But sometimes the best way to
prevent failures is to force them!

I randomly
terminate instances
in production to
identify dormant
failures.
Chaos
Monkey

Testing Philosophy:
Act Fast, React Fast

That Doesn’t Mean We Don’t Test

Cloud-Based Deployment Techniques

Current Code
In Production
API Requests from
the Internet

Single Canary Instance
To Test New Code with Production Traffic
(around 1% or less of traffic)
Current Code
In Production
API Requests from
the Internet

Single Canary Instance
To Test New Code with Production Traffic
(around 1% or less of traffic)
Current Code
In Production
API Requests from
the Internet
Error!

Current Code
In Production
API Requests from
the Internet
Perfect!

Current Code
In Production
API Requests from
the Internet
New Code
Getting Prepared for Production

Error!
Current Code
In Production
API Requests from
the Internet
New Code

API Requests from
the Internet
New Code

https://www.github.com/Netflix

Maintaining the Front Door to Netflix

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Maintaining the Front Door to Netflix

Similar to Maintaining the Front Door to Netflix (20)

Recently uploaded

Recently uploaded (20)

Maintaining the Front Door to Netflix

Editor's Notes