Router is one of the most important feature or component in Web application framework,
ant it is also one of the performance bottlenecks of framework.
In this session, I'll show you how to make router much faster than ever.
5. What is Router?
‣ Router is a component of web app framework (WAF).
‣ Router determines request handler according to request
method and request path.
Handler A
App
Server
Handler B
Handler C
Client
: HTTP Request
: HTTP Response
WSGI
App
Server Side
Router determines
"which handler?"
6. Request Handler Example
class BooksAPI(RequestHandler):
with on.path('/api/books/'):
@on('GET')
def do_index(self):
return {"action": "index"}
@on('POST')
def do_create(self):
return {"action": "create"}
....
← handler class
← URL Path
← request method
← handler func
← request method
← handler func
14. Router Class
class LinearNaiveRouter(Router):
def __init__(self, mapping):
self._mapping_list =
[ (compile_path(path), klass, funcs)
for path, klass, funcs in mapping ]
def lookup(req_meth, req_path):
for rexp, klass, funcs in self._mapping_list:
m = rexp.match(req_path)
if m:
params = [ int(v) for v in m.groups() ]
func = funcs.get(req_meth)
return klass, func, params
return None, None, None
16. Benchmark
0 10 20 30 40 50
Linear Naive
Seconds
(1M Requests)
/api/aaa
/api/aaa/{id}
/api/zzz
/api/zzz/{id}
sec
SlowerFaster
very fast on top of list
(/api/aaa, /api/aaa/{id})
very slow on bottom of list
(/api/zzz, /api/zzz/{id})
17. Pros.
Cons.
Pros & Cons
✗ Very slow when many mapping entries exist.
✓ Easy to understand and implement
19. Prefix String
mapping_list = [
("/books" , r"/books" , BooksAPI , {"GET": ...}),
("/books/" , r"/books/(d+)" , BooksAPI , {"GET": ...}),
("/orders" , r"/orders" , OrdersAPI, {"GET": ...}),
("/orders/", r"/orders/(d+)", OrdersAPI, {"GET": ...}),
]
for prefix, rexp, klass, funcs in mapping:
if not "/api/orders/123".startswith(prefix):
continue
m = rexp.match("/api/orders/123")
if m:
...
Much faster than
rexp.match()
(replace expensive operation
with cheap operation)
Prefix strings
20. Router Class
def prefix_str(s):
return s.split('{', 1)[0]
class PrefixLinearRouter(Router):
def __init__(self, mapping):
for path, klass, funcs in mapping:
prefix = prefix_str(path)
rexp = compile_path(path)
t = (prefix, rexp, klass, funcs)
self._mapping_list.append(t)
...
21. Router Class
....
def lookup(req_meth, req_path):
for prefix, rexp, klass, funcs in self._mapping:
if not req_path.startswith(prefix):
continue
m = rexp.match(req_path)
if m:
params = [ int(v) for v in m.groups() ]
func = funcs.get(req_meth)
return klass, func, params
return None, None, None
Much faster than
rexp.match()
22. Benchmark
0 10 20 30 40 50
Linear Naive
Prefix Str
Seconds
(1M Requests)
/api/aaa
/api/aaa/{id}
/api/zzz
/api/zzz/{id}
sec
SlowerFaster
about twice as fast as
naive implementation
23. Pros.
Cons.
Pros & Cons
✗ Still slow when many mapping entries exist.
✓ Makes linear search faster.
✓ Easy to understand and implement.
25. Fixed Path Dictionary
## variable path (contains one or more path parameters)
mapping_list = [
("/books" , r"/books" , BooksAPI , {"GET": ...}),
("/books/" , r"/books/(d+)" , BooksAPI , {"GET": ...}),
("/orders" , r"/orders" , OrdersAPI, {"GET": ...}),
("/orders/", r"/orders/(d+)", OrdersAPI, {"GET": ...}),
]
## fixed path (contains no path parameters)
mapping_dict = {
r"/books" : (BooksAPI , {"GET": ...}, []),
r"/orders": (OrdersAPI, {"GET": ...}, []),
}
Use fixed path as key of dict
Move fixed path to dict
26. Router Class
class FixedLinearRouter(object):
def __init__(self, mapping):
self._mapping_dict = {}
self._mapping_list = []
for path, klass, funcs in mapping:
if '{' not in path:
self._mapping_dict[path] = (klass, funcs, [])
else:
prefix = prefix_str(path)
rexp = compile_path(path)
t = (prefix, rexp, klass, funcs)
self._mapping_list.append(t)
....
27. Router Class
....
def lookup(req_meth, req_path):
t = self._mapping_dict.get(req_path)
if t: return t
for prefix, rexp, klass, funcs in self._mapping_list:
if not req_path.startswith(prefix)
continue
m = rexp.match(req_path)
if m:
params = [ int(v) for v in m.groups() ]
func = funcs.get(req_meth)
return klass, func, params
return None, None, None
Much faster than
for-loop
Number of entries
are reduced
28. Benchmark
0 10 20 30 40 50
Linear Naive
Prefix Str
Fixed Path
Seconds
(1M Requests)
/api/aaa
/api/aaa/{id}
/api/zzz
/api/zzz/{id}
sec
SlowerFaster
super fast on fixed path!
three times faster than
naive implementation
29. Pros.
Cons.
Pros & Cons
✗ Still slow when many mapping entries exist.
✓ Makes fixed path search super faster.
✓ Makes variable path search faster,
because number of entries are reduced.
✓ Easy to understand and implement.
30. Notice
‣ Don't use r"/api/v{version:int}".
‣ because all API paths are regarded as variable path.
‣ Instead, use r"/api/v1", r"/api/v2", ...
‣ in order to increase number of fixed path.
33. Matching
m = all_rexp.match("/api/users/123")
d = m.groupdict() #=> {"_0": None,
# "_1": None,
# "_2": "/api/users/123"}
for k, v in d.items():
if v:
i = int(v[1:]) # ex: "_2" -> 2
break
_, klass, funcs, pos, nparams = mapping_list[i]
arr = m.groups() #=> (None, None, None, None,
# "/api/users/123", "123")
params = arr[5:6] #=> {"123"}
34. Router Class
class NaiveRegexpRouter(Router):
def __init__(self, mapping):
self._mapping_dict = {}
self._mapping_list = []
arr = []; i = 0; pos = 0
for path, klass, funcs in mapping:
if '{' not in path:
self._mapping_dict[path] = (klass, funcs, [])
else:
rexp = compile_path(path); pat = rexp.pattern
arr.append("(?P<_%s>%s)" % (i, pat))
t = (klass, funcs, pos, path.count('{'))
self._mapping_list.append(t)
i += 1; pos += 1 + path.count('{')
self._all_rexp = re.compile("|".join(arr))
35. Router Class
....
def lookup(req_meth, req_path):
t = self._mapping_dict.get(req_path)
if t: return t
m = self._all_rexp.match(req_path)
if m:
for k, v in m.groupdict().items():
if v:
i = int(v[1:])
break
klass, funcs, pos, nparams = self._mapping_list[i]
params = m.groups()[pos:pos+nparams]
func = funcs.get(req_meth)
return klass, func, params
return None, None, None
find index in list
find param values
36. Benchmark
0 10 20 30 40 50
Linear
Regexp
Naive
Prefix Str
Fixed Path
Naive
Seconds
(1M Requests)
/api/aaa
/api/aaa/{id}
/api/zzz
/api/zzz/{id}
sec
SlowerFaster
slower than
linear search :(
38. Notice
$ python3 --version
3.4.5
$ python3
>>> import re
>>> arr = ['^/(d+)$'] * 101
>>> re.compile("|".join(arr))
File "/opt/vs/python/3.4.5/lib/python3.4/sre_compile.py",
line 579, in compile
"sorry, but this version only supports 100 named groups"
AssertionError: sorry, but this version only supports 100
named groups
Python <= 3.4 limits number of
groups in a regular expression,
and no work around :(
40. Improved Regular Expression
mapping_list = {
(r"/api/books/(d+)" , BooksAPI , {"GET": ...}),
(r"/api/orders/(d+)" , OrdersAPI , {"GET": ...}),
(r"/api/users/(d+)" , UsersAPI , {"GET": ...}),
]
arr = [ r"^/api/books/(?:d+)($)",
r"^/api/orders/(?:d+)($)",
r"^/api/users/(?:d+)($)", ]
all_rexp = re.compile("|".join(arr))
m = all_rexp.match("/api/users/123")
arr = m.groups() #=> (None, None, "")
i = arr.index("") #=> 2
t = mapping_list[i] #=> (r"/api/users/(d+)",
# UsersAPI, {"GET": ...})
No more
named groups
Tuple is much light-
weight than dict
index() is faster
than for-loop
41. Router Class
class SmartRegexpRouter(Router):
def __init__(self, mapping):
self._mapping_dict = {}
self._mapping_list = []
arr = []
for path, klass, funcs in mapping:
if '{' not in path:
self._mapping_dict[path] = (klass, funcs, [])
else:
rexp = compile_path(path); pat = rexp.pattern
arr.append(pat.replace("(", "(?:")
.replace("$", "($)"))
t = (rexp, klass, funcs)
self._mapping_list.append(t)
self._all_rexp = re.compile("|".join(arr))
42. Router Class
...
def lookup(req_meth, req_path):
t = self._mapping_dict.get(req_path)
if t: return t
m = self._all_rexp.match(req_path)
if m:
i = m.groups().index("")
rexp, klass, funcs = self._mapping_list[i]
m2 = rexp.match(req_path)
params = [ int(v) for v in m2.groups() ]
func = funcs.get(req_meth)
return klass, func, params
return None, None, None
Matching to find
index in list
Matching to get
param values
43. Benchmark
0 10 20 30 40 50
Linear
Regexp
Naive
Prefix Str
Fixed Path
Naive
Smart
Seconds
(1M Requests)
/api/aaa
/api/aaa/{id}
/api/zzz
/api/zzz/{id}
sec
SlowerFaster
Difference between
/api/aaa/{id} and
/api/zzz/{id} is small
44. Pros.
Cons.
Pros & Cons
✗ Slower when number of entries is small.
(due to overhead of twice matching)
✗ May be difficult to debug large regular
expression.
✓ Much faster than ever,
especially when many mapping entries exist.
47. Router Class
class OptimizedRegexpRouter(Router):
def __init__(self, mapping):
## Code is too complicated to show here.
## Please download sample code from github.
## https://github.com/kwatch/router-sample/
def lookup(req_meth, req_path):
## nothing changed; same as previous section
48. Benchmark
0 10 20 30 40 50
Linear
Regexp
Naive
Prefix Str
Fixed Path
Naive
Smart
Optimized
Seconds
(1M Requests)
/api/aaa
/api/aaa/{id}
/api/zzz
/api/zzz/{id}
sec
SlowerFaster
A little faster
on /api/zzz/{id}
49. Pros.
Cons.
Pros & Cons
✗ Performance benefit is very small (on Python).
✗ Rather difficult to implement and debug.
✓ A little faster than smart regular expression
when a lot of variable paths exist.
54. State Machine: Transition
def find(req_path):
req_path = req_path.lstrip('/') #ex: "/a/b/c" -> "a/b/c"
items = req_path.split('/') #ex: "a/b/c" -> ["a","b","c"]
d = transition; params = []
for s in items:
if s in d: d = d[s]
elif 1 in d: d = d[1]; params.append(int(s))
elif 2 in d: d = d[2]; params.append(str(s))
else: return None
if None not in d: return None
klass, funcs = d[None]
return klass, funcs, params
>>> find("/api/books/123")
(BooksAPI, {"GET": do_index, ...}, [123])
55. Router Class
class StateMachineRouter(Router):
def __init__(self, mapping):
self._mapping_dict = {}
self._mapping_list = []
self._transition = {}
for path, klass, funcs in mapping:
if '{' not in path:
self._mapping_dict[path] = (klass, funcs, [])
else:
self._register(path, klass, funcs)
56. Router Class
...
PARAM_TYPES = {"int": 1, "str": 2}
def _register(self, path, klass, funcs):
ptypes = self.PARAM_TYPES
d = self._transition
for s in path[1:].split('/'):
key = s
if s[0] == "{" and s[-1] == "}":
## ex: "{id:int}" -> ("id", "int")
pname, ptype = s[1:-1].split(':', 1)
key = ptypes.get(ptype) or ptypes["str"]
d = d.setdefault(key, {})
d[None] = (klass, funcs)
57. Router Class
...
def lookup(self, req_meth, req_path):
d = self._transition
params = []
for s in req_path[1:].split('/'):
if s in d: d = d[s]
elif 1 in d: d = d[1]; params.append(int(s))
elif 2 in d: d = d[2]; params.append(str(s))
else: return None, None, None
if None in d:
klass, funcs = d[None]
func = funcs.get(req_meth)
return klass, func, params
return None, None, None
58. Benchmark
0 10 20 30 40 50
Linear
Regexp
StateMachine
Naive
Prefix Str
Fixed Path
Naive
Smart
Optimized
Seconds
(1M Requests)
/api/aaa
/api/aaa/{id}
/api/zzz
/api/zzz/{id}
sec
SlowerFaster
/api/aaa/{id} and
/api/zzz/{id} are
same performance
59. Benchmark (PyPy3.5)
0 10 20 30 40 50
Linear
Regexp
StateMachine
Naive
Prefix Str
Fixed Path
Naive
Smart
Optimized
Seconds
(1M Requests)
/api/aaa
/api/aaa/{id}
/api/zzz
/api/zzz/{id}
sec
SlowerFaster
Regular Expression is
very slow in PyPy3.5
String operation is
very fast because
JIT friendly
60. Benchmark (PyPy3.5)
0 1 2 3 4 5
Linear
Regexp
StateMachine
Naive
Prefix Str
Fixed Path
Naive
Smart
Optimized
Seconds
(1M Requests)
/api/aaa
/api/aaa/{id}
/api/zzz
/api/zzz/{id}
sec
SlowerFaster
The fastest method due to
Regexp-free (= JIT friendly)
A little slower than StateMachine
because containing Regexp
61. Pros.
Cons.
Pros & Cons
✗ Not support complicated pattern.
✗ Requires some effort to support URL path suffix
(ex: /api/books/123.json).
✓ Performance champion in routing area.
✓ Much faster in PyPy3.5, due to regexp-free.
JIT friendly!
63. Conclusion
‣ Linear Search is slow.
‣ Prefix string and Fixed path dict make it faster.
‣ Regular expression is very fast.
‣ Do your best to avoid named group (or named caption).
‣ State Machine is the fastest method in Python.
‣ Especially in PyPy3, due to regexp-free (= JIT friendly).
65. My Products
‣ Benchmarker.py
Awesome benchmarking utility.
https://pythonhosted.org/Benchmarker/
‣ Oktest.py
New generation of testing framework.
https://pythonhosted.org/Oktest/
‣ PyTenjin
Super fast and feature-rich template engine.
https://www.kuwata-lab.com/tenjin/pytenjin-users-guide.html
https://bit.ly/tenjinpy_slide (presentation)