Building Web APIs that Scale

Building Web APIs that Scale
Designing for Graceful Degradation
Evan Cooke, Twilio, CTO
@emcooke

Safe Harbor
Safe harbor statement under the Private Securities Litigation Reform Act of 1995:

This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be
deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other
financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.

The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of
intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we
operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new
releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization
and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of
salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This
documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of
our Web site.

Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based
upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-
looking statements.

Cloud services and the APIs they power are
becoming the backbone of modern society. APIs
support the apps that structure how we work, play,
and communicate.

Twilio
Observations today based on
experience building @twilio

• Founded in 2008
• Infrastructure APIs to automate
phone and SMS
communications
• 120 Employees
• >1000 servers running 24x7

Cloud Workloads
Can Be
Unpredictable

Twilio SMS API Traffic
No time for…
•a human to respond to a pager
•to boot new servers

6x spike in 5 mins

Typical Scenario
Danger!
Load higher than
instantaneous throughput

Load

FAIL
Request
Latency

Goal Today

Support graceful degradation of API
performance under extreme load

No Failure

Incoming
Why Failures?
Requests

Load
Balancer Worker
Pool
AAA AAA AAA
WW
...
Throttling Throttling Throttling
App App App W App
W
Server Server Server W W
Server
WW

Worker Pools
e.g., Apache/Nginx
Failed
Requests

100%+
70%

10%
Time

Problem Summary

• Cloud services often use worker pools
to handle incoming requests
• When load goes beyond size of the
worker pool, requests fail

Queues to the rescue?

Incoming Process &
Requests Respond

1. If we synchronously respond, each item in the queue
still ties up a worker. Doh
2. If we close the incoming connection and free the
worker then we need an asynchronous callback to
respond to the request Doh

Observation 1

A synchronous web API is often much
easier for developers to integrate due
additional complexity of callbacks
Implication Responding to requests
synchronously is often preferable to queuing
the request and responding with an
asynchronous callback

Synchronous vs. Asynchronous Interfaces
Take POST data from a web form, send it to a geo lookup API, store the
result DB and return status page to user
Sync Async
d = read_form(); d = read_form();
geo = api->lookup(d); api->lookup(d);
db->store(d, geo);
return “success”; # in /geo-result
db->store(d, geo);
ws->send(“success”);
Async interface need a separate URL handler,
and websocket connection to return the result

Observation 2

For many APIs, taking additional time to
service a request is better than failing
that specific request
Implication In many cases, it is better to service
a request with some delay rather than failing it

Observation 3

It is better to fail some requests than all
incoming requests

Implication Under load, it may better to
selectively drop expensive requests that can’t
be serviced and allow others

Event-driven programming and the
Reactor Pattern

Thread/Worker Model

Worker Time
req = ‘GET /’; 1
req.append(‘/r/n/r/n’); 1
socket.write(req); 10000x
resp = socket.read(); 10000000x
print(resp); 10

Thread/Worker Model

Worker Time
req = ‘GET /’; 1
socket.write(req); 10000x
resp = socket.read(); 10000000x
print(resp); 10

Huge IO latency blocks worker

Event-based Programming

req = ‘GET /’; Make IO
req.append(‘/r/n/r/n’); operations async
socket.write(req, fn() {
socket.read(fn(resp) {
and “callback”
print(resp); when done
});
});

Reactor Dispatcher

req = ‘GET /’; Central dispatch
req.append(‘/r/n/r/n’); to coordinate
socket.write(req, fn() {
event callbacks
print(resp);
});
});
reactor.run_forever();

Non-blocking IO

Time
req = ‘GET /’; 1
socket.write(req, fn() { 10
socket.read(fn(resp) { 10
print(resp); 10
});
});
No delay blocking
reactor.run_forever(); the worker waiting
for IO

Request Response Decoupling
Using this
req = ‘GET /’; approach we can
req.append(‘/r/n/r/n’); decouple the
socket.write(req, fn() { socket of an
print(resp); incoming
}); connection from
});
reactor.run_forever(); the processing of
that connection

(Some) Reactor-Pattern Frameworks

c/libevent
c/libev
java/nio/netty
js/node.js
Goliath
ruby/eventmachine Cramp
python/twisted
python/gevent

Callback Spaghetti

req = ‘GET /’ Example of
req += ‘/r/n/r/n’
callback nesting
def r(resp): complexity with
print resp Python Twisted
def w(): (Also node.js)
socket.read().addCallback(r)

socket.write().addCallback(w)

inlineCallbacks to the Rescue

req = ‘GET /’ We can clean up
the callbacks
yield socket.write() using deferred
resp = yield socket.read() generators and
print resp
inline callbacks
(similar
frameworks also
exist for js)

Easy Sequential Programming

req = ‘GET /’ Easy sequential
programming
yield socket.write() with mostly
resp = yield socket.read() implicit
print resp
asynchronous IO

Event Python gevent
“gevent is a coroutine-based Python
networking library that uses greenlet to
provide a high-level synchronous API on
top of the libevent event loop.”

Natively asynchronous
socket.write()
resp = socket.read()
print resp

gevent Example
Easy sequential
Simple Echo Server model yet fully
from gevent.server
import StreamServer asynchronous
def echo(socket, address):
print ('New connection from %s:%s' % address)
socket.sendall('Welcome to the echo server!rn')
line = fileobj.readline()
fileobj.write(line)
fileobj.flush()
print ("echoed %r" % line)

if __name__ == '__main__':
server = StreamServer(('0.0.0.0', 6000), echo)
server.serve_forever()

gevent Example
Simple Echo Server
from gevent.server
import StreamServer However, gevent requires
daemonization, logging and
def echo(socket, address):
print ('New connection from %s:%s' % address)
other servicification functionality
socket.sendall('Welcome to the echo server!rn')
line = fileobj.readline()
for production use such
fileobj.write(line)
fileobj.flush()
print ("echoed %r" % line)Twisted’s twistd
if __name__ == '__main__':
server = StreamServer(('0.0.0.0', 6000), echo)
server.serve_forever()

Async Services with Ginkgo
Ginkgo is a simple framework for
composing asynchronous gevent services
with common configuration, logging,
demonizing etc.
https://github.com/progrium/ginkgo

Let’s look a simple example that implements a
TCP and HTTP server...

Ginkgo Example
import gevent
from gevent.pywsgi import WSGIServer Import
from gevent.server import StreamServer
from ginkgo.core import Service WSGI/TCP
Servers

Ginkgo Example
import gevent
from gevent.pywsgi import WSGIServer
from ginkgo.core import Service

def handle_http(env, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
print 'new http request!’
return ["hello world”]
HTTP Handler

Ginkgo Example
import gevent

return ["hello world"]

def handle_tcp(socket, address):
print 'new tcp connection!’
while True:
socket.send('hellon’) TCP Handler
gevent.sleep(1)

Ginkgo Example
import gevent

return ["hello world"]

def handle_tcp(socket, address):
print 'new tcp connection!’
while True:
socket.send('hellon’)
gevent.sleep(1)
Service
app = Service() Composition
app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()

Toward Fully a Asynchronous API

Using Ginkgo or another async
framework let’s look at our web-worker
architecture and see how we can modify
it to become fully asynchronous
WW
WW
W W
WW

Incoming
The Old Way
Requests

Load
Balancer Worker
Pool
AAA AAA AAA
WW
...
App App App W App
W
Server Server Server W W
Server
WW

Incoming
Requests

Load
Balancer

Async
Server
Async
Server
... Async
Server

Step 1 - Let’s start by replacing our threaded
workers with asynchronous app servers

Incoming
Requests
Huzzah, now
Load
idle open
Balancer
connections
will use very
few server
Async
Server
Async
Server
... Async
Server
resources

Step 1 - Let’s start by replacing our threaded
workers with asynchronous app servers

Incoming
Requests

Load
Balancer

AAA AAA AAA

Async
Server
Async
Server
... Async
Server

Step 2 – Define authentication and authorization
layer to identify the user and resource requested

AAA Manager

Goal Perform authentication,
authorization and accounting for each
incoming API request
Extract key parameters
• Account
• Resource Type

Incoming
Requests

Load
Balancer

AAA AAA AAA

...

Async Async Concurrency
Async
Manager
Server Server Server

Step 3 – Add a concurrency manager that
determines whether to throttle each request

Concurrency Manager

Goal determine whether to delay or drop
an individual request to limit access to
API resources
Possible inputs
• By Account
• By Resource Type
• By Availability of Dependent Resources

Concurrency Manager

What we’ve found useful
•Tuple (Account, Resource Type)

Supports multi-tenancy
• Protection between Accounts
• Protect within an account between resource
types e.g., Calls & SMS

Concurrency Manager

Concurrency manager returns one of
1. Allow the request immediately
2. Delay the request before being
processed
3. Drop the request and return an error
HTTP 429 - Concurrency Limit
Reached

Step 4 – provide for Incoming
concurrency control Requests
between the servers
Load
and backend Balancer
resources
AAA AAA AAA

...

Async Async Concurrency
Async
Manager
Server Server Server

Dependent
Services

Conclusion 1

A synchronous web API is often much
easier for developers to integrate due
additional complexity of callbacks
The proposed asynchronous API framework
allows provides for synchronous API calls
without worrying about worker pools filling up.
It is also easy to add callback where needed.

Conclusion 2

For many APIs, taking additional time to
service a request is better than failing
that specific request
provides the ability to inject into delay the
processing of incoming requests rather than
dropping them.

Example of Delay Injection

Load

Latency

Spread load across a
longer time period

Conclusion 3

It is better to fail some incoming
requests than to fail all requests

provides the ability to selectively drop requests
to limit contention on limited resources

Example of Dropping Requests

Load

Latency /x Dropped

Latency /*

Drop only the requests that we must
due to scare backend resources

Summary
Async frameworks like gevent allow you to easily
decouple a request from access to constrained
resources
API outage
Request
Latency

Time

Building Web APIs that Scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Building Web APIs that Scale

Similar to Building Web APIs that Scale (20)

More from Salesforce Developers

More from Salesforce Developers (20)

Building Web APIs that Scale