SlideShare a Scribd company logo
1 of 42
Running Aiohttp at
scale
by Pau Freixes
โ— About me
โ— Skyscanner and aiohttp, why?
โ— Tracing incoming requests
โ— Calling other microservices
โ— DNS in AWS with Aiohttp
โ— Misleading timeouts, the reactor saturation side effect.
โ— Desired plans
โ— Questions
Running Aiohttp at scale @SkyscannerEng
Pau Freixes @pfreixes
โ— Senior Software Engineer working at Skyscanner for almost 2 years.
โ— Member of the Hotels Attachment Squad.
โ— Also collaborator of Mshell Squad. Helping with the Python stuff.
โ— Open source committer aioredis, aiohttp, etc.
Running Aiohttp at scale @SkyscannerEng
Skyscanner and aiohttp, why?
Running Aiohttp at scale @SkyscannerEng
Running Aiohttp at scale @SkyscannerEng
โ— Skyscanner loves microservice pattern architecture.
โ— Microservices talk to each other using HTTP
โ— Most used languages used at Skyscanner Java, JavaScript, Python have an
official and supported HTTP framework.
โ— The adoption of an standard HTTP framework: Share knowledge, Avoid
fragmentation, Implementation of communalities
Running Aiohttp at scale @SkyscannerEng
Why Aiohttp? Or why Asyncio?
โ— Looking for a framework based on IO bound scenarios
โ—‹ AWS API, Dynamodb, S3, etc.
โ—‹ Microservices architecture. I.e DDBs abstracted on top of HTTP
Rest services.
โ— Aiohttp meets the basic requirements:
โ—‹ Acceptable performance vs commodity.
โ—‹ There is an active community.
โ— Enough maturity of Asyncio
โ—‹ Asyncio API becomes stable since Python 3.5
โ—‹ Reputed Python HTTP services have plans to use Asyncio i.e.
Tornado.
Running Aiohttp at scale @SkyscannerEng
But choosing Aiohttp/Asyncio also means face some uncertainties:
โ— Asynchronous code != synchronous code
โ—‹ Different problems
โ—‹ Different patterns
โ—‹ Less experience
โ— Some libraries might be not mature at all
โ—‹ Not enough time for growing
โ—‹ Small community
โ—‹ Less feedback
โ—‹ etc
Tracing incoming requests
Running Aiohttp at scale @SkyscannerEng
Running Aiohttp at scale @SkyscannerEng
We need to know what is happening and what happened in our microservice.
โ— HTTP endpoints statistics:
โ—‹ Time per request
โ—‹ Statistics such as avg, p90, p99
โ—‹ Number of requests
โ—‹ Status code per request. Errors
โ— HTTP Access log
โ—‹ Historically access.
โ—‹ Indexed by fields such as status code, endpoint,
โ—‹ Identify each line of log to a specific request.
Docker foundations of our microservice that enables the request tracing
Running Aiohttp at scale @SkyscannerEng
โ— Requests goes into through the HAProxy
โ— AioHttp microservice handles the incoming request
โ— AioHttp sends metrics to the StatsD container
โ—‹ Per each metric (real time)
โ—‹ Low latency network
โ— AioHttp sends logs to the Heka container
โ—‹ Per each log ( real time)
โ—‹ Low latency network
โ— StatsD sends aggregations of metrics (OpenTSDB)
โ—‹ Avg, p90, p99
โ—‹ Almost not real time, per minute
โ— Heka parsers and sends the data (ElasticSearch)
โ—‹ Almost not real time
Running Aiohttp at scale @SkyscannerEng
Aiohttp middlewares are the perfect place to instrumentalize incoming requests:
async def middleware_timing(self, app,
handler):
async def timing(request):
start = app.loop.time()
response = await handler(request)
print(app.loop.time() - start)
return response
return timing
app = web.Application(middlewares=[timing])
Running Aiohttp at scale @SkyscannerEng
Aiohttp microservices at Skyscanner come with the following middlewares for free:
โ— Metric requests: Produce statistics by request.
โ— Access Log: Upload access log.
โ— Correlation id: Identify univocally an incoming request.
Running Aiohttp at scale @SkyscannerEng
Demo about metrics and access log
Running Aiohttp at scale
How could we follow the code path executed by an specific request ?
async def foo(request):
logging.info("Doing some complicate stuff")
await asyncio.sleep(1)
async def bar(request):
start = loop.time()
await foo(request)
logging.info("Time for foo {}".format(loop.time() - start)
async def view(request):
logging.info("New request")
await bar(request)
Running Aiohttp at scale @SkyscannerEng
Any request at Skyscanner is identified by an unique ID, this identifier is saved at
some place that will be used automatically by any logging call.
Running Aiohttp at scale @SkyscannerEng
aiotask-context stores information within the current asyncio.task instance.
async foo():
aiotask_context.set("key", True)
await asyncio.sleep(1)
aiotask_context.get("key")
Running Aiohttp at scale @SkyscannerEng
The request id is stored as a task attribute by a middleware to make it available at
any code place.
async def correlation_id(self, app, handler):
async def save_correlation_id(request):
correlation_id = request.headers.get(
"Skyscanner-Correlation-Id",
request.headers.get(
"X-Correlation-Id",
str(uuid.uuid4()
)
)
context.set("Skyscanner-Correlation-Id", correlation_id)
return await handler(request)
Running Aiohttp at scale @SkyscannerEng
When the aiohttp microservice is started a new logging filter is installed to
populate automatically the request id at each logging call.
class RequestId(logging.Filter):
def filter(self, record):
correlation_id = context.get("Skyscanner-Correlation-Id")
record.correlationid = correlation_id
return True
Running Aiohttp at scale @SkyscannerEng
Installation of the RequestId filter
LOG_SETTINGS = {
'handlers': {
'console': {
'class': 'logging.StreamHandler',
'level': 'INFO',
'formatter': 'default',
'filters': ['correlationid'],
}
},
'formatters': {
'default': {
'format': '%(asctime)s %(levelname)s %(correlationid)s |
%(message)s',
},
},
}
Running Aiohttp at scale @SkyscannerEng
All logging calls will log the request id as a literal and also as a new field to be
indexed by ElasticSearch
try:
price_per_night = request.price / request.nights
except ZeroDivisionError:
logging.warning("Invalid `night` value param")
raise
Running Aiohttp at scale @SkyscannerEng
Demo about request id
Calling other microservices
Running Aiohttp at scale @SkyscannerEng
Running Aiohttp at scale @SkyscannerEng
async validate_currency(request):
session = ClientSession()
resp = await session.post("http://currency.eu-west-1.skyscnr.local",
data={'currency': request.currency}
)
if resp.status_code != 200:
raise ValidationError(
"Currency {} invalid".format(request.currency))
I.e validate if the currency sent by a request is valid or not. We will use an external
microservice service for that:
Running Aiohttp at scale @SkyscannerEng
We need to know what is happening and what happened with the calls to external
microservices.
โ— Time per request
โ—‹ Statistics such as avg, p90, p99
โ— Number of requests
โ— Status code per request. Errors
Running Aiohttp at scale @SkyscannerEng
Aiohttp does not provide an official a way to trace request events, yet. An ad hoc
class called MetricsClientSession is implemented to replace the official
ClientSession.
class MetricsClient(ClientSession):
def _request(self, *args, **kwargs):
start = self._loop.time()
response = super(MetricsClient, self)._request(*args, **kwargs)
elapsed = self._loop.time() - start
logging.info("Time spent {}".format(elapsed))
return response
Running Aiohttp at scale @SkyscannerEng
Demo about calling other services
DNS in AWS with Aiohttp
Running Aiohttp at scale @SkyscannerEng
Running Aiohttp at scale @SkyscannerEng
$ dig currency.eu-west-1.skyscnr.local
currency.eu-west-1.skyscnr.local. 59 IN A 10.51.106.106
currency.eu-west-1.skyscnr.local. 59 IN A 10.51.165.90
currency.eu-west-1.skyscnr.local. 59 IN A 10.51.35.2
AWS:
โ— DNS TTL 60 seconds
โ— IP addresses can change
โ— Number of IP addresses can grow
Running Aiohttp at scale @SkyscannerEng
Aiohttp versions < 2 does not support it
A DNS cache was implemented based on the AWS requirements. This
implementation would become the official one for Aiohttp 2 versions.
Running Aiohttp at scale @SkyscannerEng
>>> from aiohttp.connector import TCPConnector
>>> connector = TCPConnector(dns_ttl=60)
>>>
>>> async def ip(hostname, port):
>>> hosts = await connector._resolve_host(hostname, port)
>>> print(next(hosts)['host'])
>>>
>>> asyncio.get_event_loop().run_until_complete(ip("currency.eu-west-1.skyscnr.local", 8080))
10.51.165.90
>>> asyncio.get_event_loop().run_until_complete(ip("currency.eu-west-1.skyscnr.local", 8080))
10.51.35.2
>>> asyncio.get_event_loop().run_until_complete(ip("currency.eu-west-1.skyscnr.local", 8080))
10.51.106.106
>>> asyncio.get_event_loop().run_until_complete(ip("currency.eu-west-1.skyscnr.local", 8080))
10.51.165.90
Running Aiohttp at scale @SkyscannerEng
DNS cache and the dog pile effect. The following code will make 100 DNS queries
without using the cache.
import asyncio
tasks = [validate_currency('EUR') for i in range(100)]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.gather(*tasks))
Running Aiohttp at scale @SkyscannerEng
The dog pile effect happens when there is a miss in the cache, all ongoing requests
will end up performing a DNS query.
To get rid of this side effect, a throttling mechanism was implemented. Available in
Aiohttp 2.3.
Running Aiohttp at scale @SkyscannerEng
# there was a miss in the cache
if host in self._throttle_dns_events:
yield from self._throttle_dns_events[host].wait()
else:
self._throttle_dns_events[host] = 
EventResultOrError(self._loop)
addrs = yield from 
self._resolver.resolve(
host, port,
family=self._family)
self._cached_hosts.add(host, addrs)
self._throttle_dns_events[host].set()
return self._cached_hosts.next_addrs(host)
Misleading timeouts, the reactor saturation side effect.
Running Aiohttp at scale @SkyscannerEng
Running Aiohttp at scale @SkyscannerEng
Calls to third services are protected by timeouts, having the chance to do the
proper countermeasures.
async validate_currency(request):
session = ClientSession()
try:
resp = await session.post("http://currency.eu-west-1.skyscnr.local",
data={'currency': request.currency}
timeout=1,
)
except asyncio.TimeoutError:
raise HttpError(504)
Running Aiohttp at scale @SkyscannerEng
When the reactor is saturated the timeouts might be triggered. Timeouts are
handled internally by asyncio as future callbacks that will cancel a specific Future.
def cancel_future(future):
future.cancel()
async def request(*args, timeout=2):
f = asyncio.Future()
asyncio.call_later(timeout, cancel_future, f)
# some internal stuff that triggers the network
# operations
return f
Running Aiohttp at scale @SkyscannerEng
Lets try to monitor the reactor saturation, how ?
Running Aiohttp at scale @SkyscannerEng
โ€ฆ. with the LAG of a scheduled function. The time lapsed between executions can
be used to measure how busy is a reactor..
def lag():
elapsed = before - loop.time()
if elapsed > 1:
print("Reactor had a delay")
loop.call_later(lag, 1)
Running Aiohttp at scale @SkyscannerEng
Example of the lag metric. You can identify the reactor saturation that happened
at some point.
Desired Plans
Running Aiohttp at scale @SkyscannerEng
Running Aiohttp at scale @SkyscannerEng
โ— Trace queued operations
โ—‹ HTTP pool has a connection limit
โ—‹ Once the limit is reached the operation is queued
โ—‹ When there is a free connection, operation is unqueued
โ— AWS Xray support
โ—‹ Another Middleware
โ—‹ Trace calls to third services
โ— Back pressure at HTTP layer
โ—‹ When the reactor is too busy return 504
โ—‹ Scale horizontally when there are a flood of 504
Edinburgh โ€ข Glasgow โ€ข Singapore โ€ข Beijing โ€ข Miami โ€ข Barcelona โ€ข Shenzhen โ€ข Sofia โ€ข Budapest โ€ข London โ€ข Tokyo
Questions?
Slides http://bit.ly/runningatscale

More Related Content

Recently uploaded

Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort ServiceCall Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
KreezheaRecto
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
ย 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
ย 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
sivaprakash250
ย 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 

Recently uploaded (20)

PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
ย 
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort ServiceCall Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
ย 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
ย 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
ย 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
ย 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
ย 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
ย 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
ย 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
ย 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ย 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
ย 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
ย 
Top Rated Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
ย 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
ย 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
ย 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
ย 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
ย 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
ย 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
ย 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
ย 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
ย 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
ย 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
ย 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
ย 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
ย 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
ย 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
ย 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
ย 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
ย 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
ย 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
ย 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
ย 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
ย 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
ย 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
ย 

Skyscanner Engineering: Running aiohttp at scale by Pau Freixes

  • 2. โ— About me โ— Skyscanner and aiohttp, why? โ— Tracing incoming requests โ— Calling other microservices โ— DNS in AWS with Aiohttp โ— Misleading timeouts, the reactor saturation side effect. โ— Desired plans โ— Questions Running Aiohttp at scale @SkyscannerEng
  • 3. Pau Freixes @pfreixes โ— Senior Software Engineer working at Skyscanner for almost 2 years. โ— Member of the Hotels Attachment Squad. โ— Also collaborator of Mshell Squad. Helping with the Python stuff. โ— Open source committer aioredis, aiohttp, etc. Running Aiohttp at scale @SkyscannerEng
  • 4. Skyscanner and aiohttp, why? Running Aiohttp at scale @SkyscannerEng
  • 5. Running Aiohttp at scale @SkyscannerEng โ— Skyscanner loves microservice pattern architecture. โ— Microservices talk to each other using HTTP โ— Most used languages used at Skyscanner Java, JavaScript, Python have an official and supported HTTP framework. โ— The adoption of an standard HTTP framework: Share knowledge, Avoid fragmentation, Implementation of communalities
  • 6. Running Aiohttp at scale @SkyscannerEng Why Aiohttp? Or why Asyncio? โ— Looking for a framework based on IO bound scenarios โ—‹ AWS API, Dynamodb, S3, etc. โ—‹ Microservices architecture. I.e DDBs abstracted on top of HTTP Rest services. โ— Aiohttp meets the basic requirements: โ—‹ Acceptable performance vs commodity. โ—‹ There is an active community. โ— Enough maturity of Asyncio โ—‹ Asyncio API becomes stable since Python 3.5 โ—‹ Reputed Python HTTP services have plans to use Asyncio i.e. Tornado.
  • 7. Running Aiohttp at scale @SkyscannerEng But choosing Aiohttp/Asyncio also means face some uncertainties: โ— Asynchronous code != synchronous code โ—‹ Different problems โ—‹ Different patterns โ—‹ Less experience โ— Some libraries might be not mature at all โ—‹ Not enough time for growing โ—‹ Small community โ—‹ Less feedback โ—‹ etc
  • 8. Tracing incoming requests Running Aiohttp at scale @SkyscannerEng
  • 9. Running Aiohttp at scale @SkyscannerEng We need to know what is happening and what happened in our microservice. โ— HTTP endpoints statistics: โ—‹ Time per request โ—‹ Statistics such as avg, p90, p99 โ—‹ Number of requests โ—‹ Status code per request. Errors โ— HTTP Access log โ—‹ Historically access. โ—‹ Indexed by fields such as status code, endpoint, โ—‹ Identify each line of log to a specific request.
  • 10. Docker foundations of our microservice that enables the request tracing Running Aiohttp at scale @SkyscannerEng โ— Requests goes into through the HAProxy โ— AioHttp microservice handles the incoming request โ— AioHttp sends metrics to the StatsD container โ—‹ Per each metric (real time) โ—‹ Low latency network โ— AioHttp sends logs to the Heka container โ—‹ Per each log ( real time) โ—‹ Low latency network โ— StatsD sends aggregations of metrics (OpenTSDB) โ—‹ Avg, p90, p99 โ—‹ Almost not real time, per minute โ— Heka parsers and sends the data (ElasticSearch) โ—‹ Almost not real time
  • 11. Running Aiohttp at scale @SkyscannerEng Aiohttp middlewares are the perfect place to instrumentalize incoming requests: async def middleware_timing(self, app, handler): async def timing(request): start = app.loop.time() response = await handler(request) print(app.loop.time() - start) return response return timing app = web.Application(middlewares=[timing])
  • 12. Running Aiohttp at scale @SkyscannerEng Aiohttp microservices at Skyscanner come with the following middlewares for free: โ— Metric requests: Produce statistics by request. โ— Access Log: Upload access log. โ— Correlation id: Identify univocally an incoming request.
  • 13. Running Aiohttp at scale @SkyscannerEng Demo about metrics and access log
  • 14. Running Aiohttp at scale How could we follow the code path executed by an specific request ? async def foo(request): logging.info("Doing some complicate stuff") await asyncio.sleep(1) async def bar(request): start = loop.time() await foo(request) logging.info("Time for foo {}".format(loop.time() - start) async def view(request): logging.info("New request") await bar(request)
  • 15. Running Aiohttp at scale @SkyscannerEng Any request at Skyscanner is identified by an unique ID, this identifier is saved at some place that will be used automatically by any logging call.
  • 16. Running Aiohttp at scale @SkyscannerEng aiotask-context stores information within the current asyncio.task instance. async foo(): aiotask_context.set("key", True) await asyncio.sleep(1) aiotask_context.get("key")
  • 17. Running Aiohttp at scale @SkyscannerEng The request id is stored as a task attribute by a middleware to make it available at any code place. async def correlation_id(self, app, handler): async def save_correlation_id(request): correlation_id = request.headers.get( "Skyscanner-Correlation-Id", request.headers.get( "X-Correlation-Id", str(uuid.uuid4() ) ) context.set("Skyscanner-Correlation-Id", correlation_id) return await handler(request)
  • 18. Running Aiohttp at scale @SkyscannerEng When the aiohttp microservice is started a new logging filter is installed to populate automatically the request id at each logging call. class RequestId(logging.Filter): def filter(self, record): correlation_id = context.get("Skyscanner-Correlation-Id") record.correlationid = correlation_id return True
  • 19. Running Aiohttp at scale @SkyscannerEng Installation of the RequestId filter LOG_SETTINGS = { 'handlers': { 'console': { 'class': 'logging.StreamHandler', 'level': 'INFO', 'formatter': 'default', 'filters': ['correlationid'], } }, 'formatters': { 'default': { 'format': '%(asctime)s %(levelname)s %(correlationid)s | %(message)s', }, }, }
  • 20. Running Aiohttp at scale @SkyscannerEng All logging calls will log the request id as a literal and also as a new field to be indexed by ElasticSearch try: price_per_night = request.price / request.nights except ZeroDivisionError: logging.warning("Invalid `night` value param") raise
  • 21. Running Aiohttp at scale @SkyscannerEng Demo about request id
  • 22. Calling other microservices Running Aiohttp at scale @SkyscannerEng
  • 23. Running Aiohttp at scale @SkyscannerEng async validate_currency(request): session = ClientSession() resp = await session.post("http://currency.eu-west-1.skyscnr.local", data={'currency': request.currency} ) if resp.status_code != 200: raise ValidationError( "Currency {} invalid".format(request.currency)) I.e validate if the currency sent by a request is valid or not. We will use an external microservice service for that:
  • 24. Running Aiohttp at scale @SkyscannerEng We need to know what is happening and what happened with the calls to external microservices. โ— Time per request โ—‹ Statistics such as avg, p90, p99 โ— Number of requests โ— Status code per request. Errors
  • 25. Running Aiohttp at scale @SkyscannerEng Aiohttp does not provide an official a way to trace request events, yet. An ad hoc class called MetricsClientSession is implemented to replace the official ClientSession. class MetricsClient(ClientSession): def _request(self, *args, **kwargs): start = self._loop.time() response = super(MetricsClient, self)._request(*args, **kwargs) elapsed = self._loop.time() - start logging.info("Time spent {}".format(elapsed)) return response
  • 26. Running Aiohttp at scale @SkyscannerEng Demo about calling other services
  • 27. DNS in AWS with Aiohttp Running Aiohttp at scale @SkyscannerEng
  • 28. Running Aiohttp at scale @SkyscannerEng $ dig currency.eu-west-1.skyscnr.local currency.eu-west-1.skyscnr.local. 59 IN A 10.51.106.106 currency.eu-west-1.skyscnr.local. 59 IN A 10.51.165.90 currency.eu-west-1.skyscnr.local. 59 IN A 10.51.35.2 AWS: โ— DNS TTL 60 seconds โ— IP addresses can change โ— Number of IP addresses can grow
  • 29. Running Aiohttp at scale @SkyscannerEng Aiohttp versions < 2 does not support it A DNS cache was implemented based on the AWS requirements. This implementation would become the official one for Aiohttp 2 versions.
  • 30. Running Aiohttp at scale @SkyscannerEng >>> from aiohttp.connector import TCPConnector >>> connector = TCPConnector(dns_ttl=60) >>> >>> async def ip(hostname, port): >>> hosts = await connector._resolve_host(hostname, port) >>> print(next(hosts)['host']) >>> >>> asyncio.get_event_loop().run_until_complete(ip("currency.eu-west-1.skyscnr.local", 8080)) 10.51.165.90 >>> asyncio.get_event_loop().run_until_complete(ip("currency.eu-west-1.skyscnr.local", 8080)) 10.51.35.2 >>> asyncio.get_event_loop().run_until_complete(ip("currency.eu-west-1.skyscnr.local", 8080)) 10.51.106.106 >>> asyncio.get_event_loop().run_until_complete(ip("currency.eu-west-1.skyscnr.local", 8080)) 10.51.165.90
  • 31. Running Aiohttp at scale @SkyscannerEng DNS cache and the dog pile effect. The following code will make 100 DNS queries without using the cache. import asyncio tasks = [validate_currency('EUR') for i in range(100)] loop = asyncio.get_event_loop() loop.run_until_complete(asyncio.gather(*tasks))
  • 32. Running Aiohttp at scale @SkyscannerEng The dog pile effect happens when there is a miss in the cache, all ongoing requests will end up performing a DNS query. To get rid of this side effect, a throttling mechanism was implemented. Available in Aiohttp 2.3.
  • 33. Running Aiohttp at scale @SkyscannerEng # there was a miss in the cache if host in self._throttle_dns_events: yield from self._throttle_dns_events[host].wait() else: self._throttle_dns_events[host] = EventResultOrError(self._loop) addrs = yield from self._resolver.resolve( host, port, family=self._family) self._cached_hosts.add(host, addrs) self._throttle_dns_events[host].set() return self._cached_hosts.next_addrs(host)
  • 34. Misleading timeouts, the reactor saturation side effect. Running Aiohttp at scale @SkyscannerEng
  • 35. Running Aiohttp at scale @SkyscannerEng Calls to third services are protected by timeouts, having the chance to do the proper countermeasures. async validate_currency(request): session = ClientSession() try: resp = await session.post("http://currency.eu-west-1.skyscnr.local", data={'currency': request.currency} timeout=1, ) except asyncio.TimeoutError: raise HttpError(504)
  • 36. Running Aiohttp at scale @SkyscannerEng When the reactor is saturated the timeouts might be triggered. Timeouts are handled internally by asyncio as future callbacks that will cancel a specific Future. def cancel_future(future): future.cancel() async def request(*args, timeout=2): f = asyncio.Future() asyncio.call_later(timeout, cancel_future, f) # some internal stuff that triggers the network # operations return f
  • 37. Running Aiohttp at scale @SkyscannerEng Lets try to monitor the reactor saturation, how ?
  • 38. Running Aiohttp at scale @SkyscannerEng โ€ฆ. with the LAG of a scheduled function. The time lapsed between executions can be used to measure how busy is a reactor.. def lag(): elapsed = before - loop.time() if elapsed > 1: print("Reactor had a delay") loop.call_later(lag, 1)
  • 39. Running Aiohttp at scale @SkyscannerEng Example of the lag metric. You can identify the reactor saturation that happened at some point.
  • 40. Desired Plans Running Aiohttp at scale @SkyscannerEng
  • 41. Running Aiohttp at scale @SkyscannerEng โ— Trace queued operations โ—‹ HTTP pool has a connection limit โ—‹ Once the limit is reached the operation is queued โ—‹ When there is a free connection, operation is unqueued โ— AWS Xray support โ—‹ Another Middleware โ—‹ Trace calls to third services โ— Back pressure at HTTP layer โ—‹ When the reactor is too busy return 504 โ—‹ Scale horizontally when there are a flood of 504
  • 42. Edinburgh โ€ข Glasgow โ€ข Singapore โ€ข Beijing โ€ข Miami โ€ข Barcelona โ€ข Shenzhen โ€ข Sofia โ€ข Budapest โ€ข London โ€ข Tokyo Questions? Slides http://bit.ly/runningatscale