This document describes a Pratt parser implementation in Python. It begins with an example mathematical expression to parse and reasons for writing one's own parser. It then provides an overview of Pratt parsing, describing it as an efficient, modular technique. The document proceeds to describe the parser implementation including token and node classes. It defines the grammar using token precedence rather than BNF. Overall, the document presents a Python implementation of a Pratt precedence parser for arithmetic expressions.
4. Why write your own parser?
● It is not an as big a task as it might seem
● More control over the implementation
details/techniques
● Many of the existing python parsing libraries are lacking
in one or more areas
● Writing parsers is fun
4
5. What is a Pratt Parser and why
use it?
● Parsing technique designed for parsing operator
precedence correctly
● First appeared in “Top Down Operator Precedence” by
Vaughan Pratt (1973)
● A variation of a recursive descent parser but
○ Efficient
○ Modular and flexible
○ Easy to implement and and iterate upon
○ Beautiful
5
6. Why isn’t it more popular?
“One may wonder why such an "obviously" utopian approach has not been generally
adopted already. I suspect the root cause of this kind of oversight is our universal
preoccupation with BNF grammars and their various offspring grammars[...] together
with their related automata and a large body of theorems. I am personally enamored
of automata theory per se, but I am not impressed with the extent to which it has so
far been successfully applied to the writing of compilers or interpreters. Nor do I see a
particularly promising future in this direction. Rather, I see automata theory as
holding back the development of ideas valuable to language design that are not
visibly in the domain of automata theory.”
Vaughan R. Pratt “Top Down Operator Precedence”
6
11. class Symbol(object):
"""Base class for all nodes"""
id = None
lbp = 0
def __init__(self, parser, value=None):
self.parser = parser
self.value = value or self.id
self.first = None
self.second = None
def nud(self):
"""Null denotation. Prefix/Nilfix symbol"""
raise ParserError("Symbol action undefined for `%s'" % self.value)
def led(self, left):
"""Left denotation. Infix/Postfix symbol"""
raise ParserError("Infix action undefined for `%s'" % self.value)
11
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
12. class Literal(Symbol):
"""Simple literal (a number or a variable/function name)
just produces itself"""
def nud(self):
return self
class Prefix(Symbol):
"""Prefix operator.
For the sake of simplicity has fixed right binding power"""
def nud(self):
self.first = self.parser.expression(80)
return self
12
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
18. expr.define("<punct>")
@expr.define("(", 90)
class FunctionCall(Symbol):
"""Defining both function application and parenthesized expression"""
def nud(self):
e = self.parser.expression(0)
self.parser.advance(")")
return e
def led(self, left):
self.first = left
args = self.second = []
p = self.parser
while p.token.value != ")":
args.append(p.expression(0))
if p.token.value != ",":
break
p.advance(",")
p.advance(")")
return self
18
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
19. TOKENS = (
('ws', r's+'),
('name', r'[a-z][w_]*'),
('infix', r'[+-*/^]'),
('punct', r'[(),]'),
('number', r'(:?d*.)?d+'),
)
TOKEN_RE = '|'.join("(?P<%s>%s)" % t for t in TOKENS)
LEX_RE = re.compile(TOKEN_RE, re.UNICODE | re.IGNORECASE)
class Token(object):
def __init__(self, token_type, value, pos):
self.token_type = token_type
self.value = value
self.pos = pos
But what about lexing?
19
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
20. def lex(source, pat=LEX_RE):
i = 0
def error():
raise LexerException(
"Unexpected character at position %d: `%s`" % (i, source[i])
)
for m in pat.finditer(source):
pos = m.start()
if pos > i:
error()
i = m.end()
name = m.lastgroup
if name != "ws":
token_type = "<%s>" % name
yield Token(token_type, m.group(0), pos)
if i < len(source):
error()
20
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
21. ● https://tdop.github.io/
Vaughan R. Pratt "Top Down Operator Precedence" (1973)
● http://javascript.crockford.com/tdop/tdop.html
Douglas Crockford "Top Down Operator Precedence" (2007)
● http://effbot.org/zone/simple-top-down-parsing.htm
Fredrik Lundh "Simple Top-Down Parsing in Python" (2008)
All code in this presentation can be found at:
https://github.com/percolate/pratt-parser
References
21
We are Percolate and we’re always hiring great engineers. Talk to us