Creating an experimental GraphQL formatter using Clojure, Instaparse, and GraalVM

•Download as PPTX, PDF•

0 likes•81 views

The document discusses the creation of an experimental GraphQL formatter using Clojure, Instaparse, and GraalVM. It provides background on the creator and their motivation to build a GraphQL formatter due to frustrations with inconsistencies across existing tools. It then details the initial scope, progress over time implementing a parser and formatter, lessons learned, performance testing, and opportunities for future improvement.

Engineering

Creating an experimental GraphQL formatter using Clojure, Instaparse,
and GraalVM
Ilmo Raunio
Clojure Meetup Oulu
2022-12-01

Background
Ilmo Raunio
https://ilmo.me
https://github.com/ilmoraunio/
Programmer@Metosin for +3 years
Started my career in 2013
First Clojure sideproject started in 2015
Started doing Clojure professionally in 2018

● Working on a GraphQL backend project during 2018–2021
○ Clojure, lacinia
● Was somewhat frustrated in how formatting worked
inconsistently across GraphQL tooling
○ prettier, graphiql, etc.
● Had long been excited about instaparse—was waiting for
the perfect excuse to use it somewhere
● Only missing step: no way to create binaries with Clojure
Beginnings of an idea

● Wanted to learn how to write parsers (using context-free
grammars) and to learn about how formatters work
● Scoped in (originally):
○ basic GraphQL examples can be parsed, using the GraphQL spec
(June 2018)
○ formatting produces same result as prettier, with default options
● Scoped out: performance, simple recursion only, any CLI
options, error handling, some really niche GraphQL
language grammar feature(s)
● Programming time was very limited, progress was slow.
Setting the expectations

Progress
1. Write parser
2019-11-14
2020-03-18
2. Implement formatting
2021-04-11
3. Character wrap

Lessons
● combining tokens into a
single regular expression
may improve performance

Lessons
● Don’t implement something you don’t understand

Lessons
● formatters don’t necessarily cover the whole spec

Lessons
● If you can’t make something work, read the reference code
(prettier) to understand how that problem has been solved …
you may end up learning something :-)

Performance
● Not an original goal, but became interested in it later!
● Generally, used criterium and clj-async-profiler for testing
within the REPL
● hyperfine in bash
● graphqlfmt vs prettier --parser=graphql
○ Caveat: Benchmarking is hard (let’s go shopping),
all numbers are indicative.

Performance
● Hypothesis: most of
the bottlenecks could
exist inside the
grammar

Where do we stand?
● Original goals achieved… sort of!
● Not for production use
● Things to improve/build
○ Easy to find breaking “edge” cases
○ Improved CI jobs
■ build binaries
■ prettier output == graphqlfmt output
○ Comment formatting
○ lazy recursion or loop-recur?
○ Performance
● PRs and issues welcome :-)

Doing things differently next time around
● Write my own (non-instaparse-based) parser
● Protocols, reification?
● Anything else? 🤷

Thanks!
https://github.com/ilmoraunio/graphqlfmt-clj-experiment

What's hot

Easy Microservices with JHipster - Devoxx BE 2017Deepu K Sasidharan

GraphqlNiv Ben David

Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N

Server monitoring using grafana and prometheusCeline George

VueJS IntroductionDavid Ličen

"How to Develop with Qt for Multiple Screen Resolutions and Increase Your Use...FELGO SDK

Best practices for highly available and large scale SolrCloudAnshum Gupta

GrafanaNoelMc Grath

Prometheus design and philosophy Docker, Inc.

Grafana.pptxBhushan Rane

Elasticsearch From the Bottom Upfoundsearch

Lezione 8 Il Web SemanticoStefano Epifani

Infrastructure & System Monitoring using PrometheusMarco Pas

Apache AirflowKnoldus Inc.

Prometheus + Grafana = Awesome MonitoringHenrique Galafassi Dalssaso

Evolution of The Twitter StackChris Aniszczyk

Monitoring MicroservicesWeaveworks

JHipster presentation by Gaetan BlochGaëtan Bloch

Serverless with Google Cloud FunctionsJerry Jalava

Spark architecturedatamantra

What's hot (20)

Easy Microservices with JHipster - Devoxx BE 2017

Graphql

Prometheus - Intro, CNCF, TSDB,PromQL,Grafana

Server monitoring using grafana and prometheus

VueJS Introduction

"How to Develop with Qt for Multiple Screen Resolutions and Increase Your Use...

Best practices for highly available and large scale SolrCloud

Grafana

Prometheus design and philosophy

Grafana.pptx

Elasticsearch From the Bottom Up

Lezione 8 Il Web Semantico

Infrastructure & System Monitoring using Prometheus

Apache Airflow

Prometheus + Grafana = Awesome Monitoring

Evolution of The Twitter Stack

Monitoring Microservices

JHipster presentation by Gaetan Bloch

Serverless with Google Cloud Functions

Spark architecture

Similar to Creating an experimental GraphQL formatter using Clojure, Instaparse, and GraalVM

Getting started contributing to Apache SparkHolden Karau

Modular GraphQL with Schema StitchingSashko Stubailo

Why you don't need maths to get benefits of mlAseem Bansal

Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondDatabricks

Growing up new PostgreSQL developers (pgcon.org 2018)Aleksander Alekseev

Journey to GoogleGDSC PJATK

Your first 5 PHP design patterns - ThatConference 2012Aaron Saray

Influx/Days 2017 San Francisco | Dan VanderkamInfluxData

Your Portfolio as a ProductEleanor Stribling

How to deliver the right software (Specification by example)Asier Barrenetxea

Kickstarting career as an Android developer.pdfShreyaDhurde

Boston Startup School - OO DesignBryan Warner

Open Day July 2019Frappe Technologies Pvt. Ltd.

GSoC improvements on plonecliKUMAR AKSHAY

Upwork time log and difficulty 20160523Sharon Liu

The Future is Here: ECMAScript 6 in the WildAdrian-Tudor Panescu

Ecma6 in the wildCodecamp Romania

Teach yourself Ruby on Railspatrikbona

Tensorflow goPatrick Walker

Similar to Creating an experimental GraphQL formatter using Clojure, Instaparse, and GraalVM (20)

Getting started contributing to Apache Spark

Modular GraphQL with Schema Stitching

Why you don't need maths to get benefits of ml

Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond

Growing up new PostgreSQL developers (pgcon.org 2018)

Journey to Google

Your first 5 PHP design patterns - ThatConference 2012

Influx/Days 2017 San Francisco | Dan Vanderkam

Your Portfolio as a Product

How to deliver the right software (Specification by example)

Kickstarting career as an Android developer.pdf

Boston Startup School - OO Design

Open Day July 2019

GSoC improvements on plonecli

Upwork time log and difficulty 20160523

The Future is Here: ECMAScript 6 in the Wild

Ecma6 in the wild

Teach yourself Ruby on Rails

Tensorflow go

Recently uploaded

Introduction to Multiple Access Protocol.pptxupamatechverse

AKTU Computer Networks notes --- Unit 3.pdfankushspencer015

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

Introduction and different types of Ethernet.pptxupamatechverse

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

Extrusion Processes and Their Limitations120cr0395

Porous Ceramics seminar and technical writingrakeshbaidya232001

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

Recently uploaded (20)

Introduction to Multiple Access Protocol.pptx

AKTU Computer Networks notes --- Unit 3.pdf

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

UNIT-III FMM. DIMENSIONAL ANALYSIS

Processing & Properties of Floor and Wall Tiles.pptx

Coefficient of Thermal Expansion and their Importance.pptx

Introduction and different types of Ethernet.pptx

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Extrusion Processes and Their Limitations

Porous Ceramics seminar and technical writing

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

Roadmap to Membership of RICS - Pathways and Routes

KubeKraft presentation @CloudNativeHooghly

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

SPICE PARK APR2024 ( 6,793 SPICE Models )

Creating an experimental GraphQL formatter using Clojure, Instaparse, and GraalVM

1. Creating an experimental GraphQL formatter using Clojure, Instaparse, and GraalVM Ilmo Raunio Clojure Meetup Oulu 2022-12-01

2. Background Ilmo Raunio https://ilmo.me https://github.com/ilmoraunio/ Programmer@Metosin for +3 years Started my career in 2013 First Clojure sideproject started in 2015 Started doing Clojure professionally in 2018

3. ● Working on a GraphQL backend project during 2018–2021 ○ Clojure, lacinia ● Was somewhat frustrated in how formatting worked inconsistently across GraphQL tooling ○ prettier, graphiql, etc. ● Had long been excited about instaparse—was waiting for the perfect excuse to use it somewhere ● Only missing step: no way to create binaries with Clojure Beginnings of an idea

4. Drafting the first steps

5. Drafting the first steps 5 days later…

7. But then what?

8. ● Wanted to learn how to write parsers (using context-free grammars) and to learn about how formatters work ● Scoped in (originally): ○ basic GraphQL examples can be parsed, using the GraphQL spec (June 2018) ○ formatting produces same result as prettier, with default options ● Scoped out: performance, simple recursion only, any CLI options, error handling, some really niche GraphQL language grammar feature(s) ● Programming time was very limited, progress was slow. Setting the expectations

9. Progress 1. Write parser 2019-11-14 2020-03-18 2. Implement formatting 2021-04-11 3. Character wrap

10.

11.

12.

13.

14. Progress 1. Write parser 2019-11-14 2020-03-18 2. Implement formatting 2021-04-11 3. Character wrap

15.

16.

17.

18. Progress 1. Write parser 2019-11-14 2020-03-18 2. Implement formatting 2021-04-11 3. Character wrap

19.

20. Demo

21. Lessons ● combining tokens into a single regular expression may improve performance

22. Lessons

23. Lessons

24. Lessons ● Don’t implement something you don’t understand

25. Lessons

26. Lessons

27. Lessons

28. Lessons ● formatters don’t necessarily cover the whole spec

29. Lessons ● If you can’t make something work, read the reference code (prettier) to understand how that problem has been solved … you may end up learning something :-)

30. Performance ● Not an original goal, but became interested in it later! ● Generally, used criterium and clj-async-profiler for testing within the REPL ● hyperfine in bash ● graphqlfmt vs prettier --parser=graphql ○ Caveat: Benchmarking is hard (let’s go shopping), all numbers are indicative.

31.

32.

33. Performance ● Hypothesis: most of the bottlenecks could exist inside the grammar

34. Where do we stand? ● Original goals achieved… sort of! ● Not for production use ● Things to improve/build ○ Easy to find breaking “edge” cases ○ Improved CI jobs ■ build binaries ■ prettier output == graphqlfmt output ○ Comment formatting ○ lazy recursion or loop-recur? ○ Performance ● PRs and issues welcome :-)

35. Doing things differently next time around ● Write my own (non-instaparse-based) parser ● Protocols, reification? ● Anything else? 🤷

36. Thanks! https://github.com/ilmoraunio/graphqlfmt-clj-experiment

Editor's Notes

I’m a programmer at Metosin, been there for 3 years now. I’ve been programming professionally for 9 years, of which 4,5 years have been with Clojure (the other half is Java and some bits of Python and JavaScript). Read my first Clojure book in 2013. It was Clojure Programming (by Chas Emerick, Brian Carper, Christophe Grande), couple years later in 2015 I started out my first Clojure sideproject. I’ve been hacking away with Clojure ever since :-)
This story however starts back in 2018 when I was working on a GraphQL backend project I was somewhat frustrated in how formatting worked across different tooling That’s when I started thinking about how formatting in general works Had been excited about instaparse, but I had not found the perfect opportunity to actually use it For those who don’t know, instaparse is a library for creating executable parsers from context-free grammars, so you can skip a lot of the portions when writing the parser with potential cost to parsing performance (for example) So I had this idea, but at the time there weren’t really any way to make a binary out of this all, which is what I wanted until…
3 years ago: Michel posted the missing piece of the puzzle, instructions on how to compile Clojure binaries using GraalVM native-image
I was excited, and 5 days later, I had my poc running; a really basic instaparse+clojure binary
On the left handside: the very basic parser example from instaparse On the right handside: graalvm compilation script (As an aside, I won’t be talking much about GraalVM after this but will focus more on the formatting implementing side of things.)
I had no idea what I was going into: I had not really written any parsers before this point (if you don’t count regular expressions ;-)), I don’t have a theoretical computer science background, totally new stuff for me. Which of course meant I was really excited to do this stuff. And setting the tone of having fun was important because I was going to be doing this for a while :-)
Bridge: but so were setting the expectations Re:formatting, I wanted to stick to language & formatting standards defined by someone else. Niche GraphQL grammar parts: parameterized grammar productions
Progress was roughly split into three parts: writing the parser, implementing formatting, and adding support for character wrap. When I first started out at the end of 2019, I began writing the context-free grammar adding unit tests for each token that I implemented into the grammar.
This is from the document definition of my GraphQL grammar adlib: executabledefinitions (queries), typesystemdefinitions (schema), typesystemextensions (schema extensions)
This is a grammar of the ignored characters adlib: lots of regular expressions in here, learned how to parse lineterminator with negative lookaheads (row 9), on the right there’s a passage from the GraphQL specification matching the implementation
A preview of the token definitions This part was the most straightforward to implement
here’s some of my unit tests I had for my grammar my unit tests typically had variations where one example was extremely compacted, one was extremely spaced-out, and something from the middle
(4 months later) I finished the initial version of the grammar and started implementing formatting: I copied all the unit test inputs (basically, the GraphQL statements) and started implementing correct formatting for each statement, going through cases one-by-one
here are some of my formatting tests I had my formatting tests written as GraphQL statements on the right is an example query with multiple executable definitions
Some formatting tests also had varying inputs and outputs (pause) For example, this is an object type definition where the extra ampersand character is dropped out from the output if there are no extra interfaces to implement. And this is a perfectly reasonable outcome…
…and so is this even. What this test does it tries to ensure that the two consecutive empty lines on rows 11 and 12 become one. (pause) It’s a very prettier-specific detail, a tiny detail, but one that we just couldn’t sideline. (pause) When implementing a formatter, all the details matter. (longer pause) Overall though, I ended up having around 100 of these formatting tests.
I did not really make fast progress in 2020 so it took me some time to finish the formatting part. But I eventually got there (the formatting phase took roughly a year) and started implementing character wrap
What character wrap means is that some forms will become “structured” as the max row count exceeds, for example, 80 characters. Here are some examples demonstrating wrapping with GraphQL arguments and variable definitions Typically it’s the arguments and variable definitions that will “wrap” Down below we can see that the arguments do not wrap and it’s a good example of how formatting affects only some parts, those whose line exceeds the maximum allowed character count (and this really implied that the original token-based AST would not be good enough but we had to transform the AST into something else down the line)
(Demo time — let’s look at some code and data)
(Lessons) At the beginning I was using EBNF syntax quite extensively, this lead to some performance issues with queries that, for example, presented itself with many whitespaces.
When I noticed this slowness, I started looking into it by looking at the number of combinations. Once I realized that the abundant whitespace was the problem, I was able to inline the checks into a single regular expression (which is what instaparse will advise you to do) bringing the number of combinations down from roughly 30k to just 30
And it was actually exhilarating to see these optimizations work and reduce runtime performance by over 200ms Just seeing that I felt pure dumbstruck (pause) So lesson learned: combining tokens into a single regular expression may improve performance
(2nd main lesson) GraphQL formatting may sometimes feel like it’s not actually stable and it might be hard to tell sometimes. If we take an example of a comment within a query selection and try to pass that through prettier, what do we expect to get? Where do we expect to place the comment marker as a result? (pause)
(Most of you probably guessed it right and placed it after the first b, and…) There’s nothing insane about this. This is the right answer. But let’s say we’d take the output and run it through prettier again… (pause) well, you can already guess the output is probably not going to be stable. And so—can anyone guess where this comment will go next? the second a? the third brace? (√) it will disappear?
The third brace it is BUT it will not only be placed after the brace, but the brace itself will be formatted again in a completely different way. (pause) Now, at this point I was already crying as I was trying to implement comment formatting, but I didn’t feel like it was going anywhere with these examples. Still, I wanted to see where just how deep the rabbit hole goes. And when I fed this again to prettier the output was…
… this abomination here. (pause) The good news was that it did stabilize after this. But this actually lead me to stop implementing support for comment formatting for now, because I wasn’t sure what I was venturing into trying to implement this thing. I felt it was definitely easier for now to not support this and go back to try to understand the problem space a little bit better. So I decided to give it some more hammock time.
(3rd main lesson) This is an example of a SchemaTypeExtension for which formatting did not use to previously exist … but nowadays it does. However, it was certainly an eye opener that even the seemingly battletested tools can lack support for something and that something may be quite critical to your objective. So this is good to be aware of. Of course, I could have just submitted the fix to prettier myself. A good lesson as well.
(4th lesson) When trying to implement character wrap, I kept struggling trying to get it working with the grammar-token-based AST. That was before I read the prettier source code. It was there that I learned that I should probably introduce softlines and turn my AST into row-based semantics. That worked! It did feel a bit like cheating, but OTOH I was able to continue with my sideproject. It was a good lesson in humility. If you can’t make something work, read the source code from other tools for inspiration.
Performance was not an original goal but I started considering it more and more as the formatting functionality got more mature Generally I used a few performance testing tools such as criterium and hyperfine on bash. For this talk, I ran only a couple simple performance scenarios with hyperfine and clj-async-profiler I didn’t approach it with hard science in mind, I just wanted to see the performance limits of my formatter Has to be said: benchmarking is hard, don’t take these numbers for hard truth.
The first test case was a really simple query to get the baseline performance for my formatter it was roughly 20ms for prettier it was 190ms
And for the second test case I had a slightly more complex schema Running that already showed me that my formatter’s performance was dipping, at over 300ms when prettier was running at 197ms, an okay number if you ignore the baseline
This got me thinking: maybe the bottlenecks are inside the grammar, so I quickly checked what’s happening inside On the right, in the flame graph, over half of the execution time consists of instaparse doing something with the Parser. (pause) Now, I didn’t do any deep diving into this after this point. But it did raise the question whether it’s just a major inefficiency or a core, fundamental issue. And I don’t really know the answer to that.
(Where do we stand today?) I mostly managed to achieve my original goals I learned how to write a parser with instaparse and was able to write a minimally functional formatter Even though it’s easy to find many examples that will break it, the basic examples are quite well covered and it should be easy to add support for new cases as well What are some things we can improve on Fixing edge cases that prettier can handle Building binaries on GitHub Actions Pinning down the prettier version and actively comparing results Comment formatting support should probably be added Could consider adding support for long or complex queries that currently break the stack by replacing simple recursion functions with loop-recur or lazy recursion (eg. tree-seq) Finally, performance! Most likely: regular expression inlining, removing redundant tokens checks (eg. checking for ignored tokens). Things like that. If you are interested in any of these, feel free to contribute :-)
(What would I consider doing differently next time?) Protocols, reification: Became interested in having this after seeing this in malli: open question being, would it provide better architecture and more optimizations?
Thank you for listening! I’m open for questions now and later as well if you’re interested in this more.

Creating an experimental GraphQL formatter using Clojure, Instaparse, and GraalVM

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Creating an experimental GraphQL formatter using Clojure, Instaparse, and GraalVM

Similar to Creating an experimental GraphQL formatter using Clojure, Instaparse, and GraalVM (20)

More from Metosin Oy

More from Metosin Oy (19)

Recently uploaded

Recently uploaded (20)

Creating an experimental GraphQL formatter using Clojure, Instaparse, and GraalVM

Editor's Notes