6. What is the problem?
Add boolean support to our product:
- Validate the syntax
- Being compatible with the most popular syntax (no standard)
- Query our database
- Prevent malicious query
(TACD:( ("electric bike” OR "electric $w4 bike” OR electricbike) AND (photovoltaic
OR solar) AND cell* ) AND AUTHORITY:( US OR JP OR EP OR WO ) AND
PBD_Y:[2000 to *]) NOT ( ALL_AN:Gazelle )
An example:
8. What is Instaparse?
Instaparse is a Clojure library aims to be the simplest
way to build context free grammar parsers in Clojure.
First version release in 2013
https://github.com/Engelberg/instaparse
CLJC compatible
9. Context free grammar?
In practical term we want:
(defn text->tree [text-input rules])
(defn tree->sql [tree])
10. An example
(def as-and-bs
(insta/parser
"S = AB*
AB = A B
A = 'a'+
B = 'b'+"))
(as-and-bs "aaaaabbbaaaabb")
[:S
[:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]]
[:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]]
11. A solution
(TACD:( ("electric bike” OR "electric $w4 bike” OR electricbike) AND (photovoltaic OR
solar) AND cell* ) AND AUTHORITY:( US OR JP OR EP OR WO ) AND PBD_Y:[2000 to *])
NOT ( ALL_AN:Gazelle )
Split into smaller pieces:
● word: photovoltaic
● quote-word: “electric bike”
● field-word: T:elect*
● operation: AND, OR, NOT
● list : (a b c)
● field-list: TA:(a b c)
12. A solution
(def boolean-parser
(insta/parser
"S = (exp )*
exp = word | op | quote-word | list | fields-list
word = fields?#'[a-zA-Z0-9_*/$#?-]+'
op = <space>'OR'<space> | <space>'AND'<space> | <space>'NOT'<space>
quote-word = fields? <'"'> (word<space> )* <'"'>
list = <lparen> (exp )* <rparen>
fields-list = fields list
fields = field <':'><space>
field = 'T' | 'TA' | 'A'
<lparen> = <'('><space>
<rparen> = <space><')'>
<space> = <#'[ ]*'>"
))
17. Learning
1. Leaving room for human typo
2. Recursion comes for free
3. Don’t make it too complicated: used insta/transform, clojure.walk or
clojure.zip instead