Despite being "Structured" Query Language, there's not much structure required to write SQL. However, poorly formatted SQL makes debugging and editing more time consuming and inefficient than it needs to be. Bad examples of SQL abound and have found their way into the mainstream as practitioners adopt the methods they find on popular web sites and blogs into their own repertoire.
This presentation is appropriate for anyone who uses SQL, no matter how seasoned. It introduces pragmatic design patterns for writing SQL that make it easier (and more enjoyable) to understand and edit. Based on over 25 years of applied experience with SQL as a primary programming language, these techniques are are relevant to any implementation of SQL, including Oracle, MySQL, SQL Server and PostgreSQL.
2. The Four Rules of
Writing SQL
To improve readability and self-document code, follow these
four simple rules.
3. SELECT
order.order_id, order_line, product_name, unit_price, supplier_
name, SUM(total_units), sum(unit_price*total_units)
FROM order, product, order_items
WHERE ((order_items.product_id = product.product_id) and
((order.order_id = order_items.order_id) and (customer_id =
42)))
GROUP BY
order_id, product_name, unit_price, supplier_name, total_units
ORDER BY order_id, product_name;
4. Rule 1
Everything on its own line
This improves readability and makes troubleshooting easier.
7. Rule 2
Put commas, ANDs at the
beginning of lines, not the end
This minimizes commenting necessary to remove something
from your query.
8. SELECT order.order_id
, order_line
, product_name
, unit_price
, supplier_name
, SUM(total_units)
, sum(unit_price*total_units)
FROM order
, product
, order_items
WHERE ((order_items.product_id = product.product_id)
and ((order.order_id = order_items.order_id)
and (customer_id = 42)))
9. SELECT order.order_id
--, order_line
, product_name
--, unit_price
, supplier_name
--, SUM(total_units)
--, sum(unit_price*total_units)
FROM order
, product
--, order_items
WHERE ((order_items.product_id = product.product_id)
--and ((order.order_id = order_items.order_id)
( and (customer_id = 42)))
10. Rule 3
When joining tables, use short,
meaningful aliases for tables
and always prefix columns
It eliminates confusion about what table a column belongs
to, and improves readability.
11. SELECT o.order_id
, ol.order_line
, p.product_name
, ol.unit_price
, p.supplier_name
, SUM(ol.total_units)
, sum(p.unit_price*ol.total_units)
FROM order o
, product p
, order_items oi
WHERE ((oi.product_id = p.product_id)
and ((o.order_id = oi.order_id)
and (o.customer_id = 42)))
12. Rule 4
Avoid parentheses in WHERE
clauses unless required to nest
“OR” expressions
Simplify SQL by eliminating that which is unnecessary
13. SELECT o.order_id
, ol.order_line
, p.product_name
, ol.unit_price
, p.supplier_name
, SUM(ol.total_units)
, sum(p.unit_price*ol.total_units)
FROM order o
, product p
, order_items oi
WHERE oi.product_id = p.product_id
and o.order_id = oi.order_id
and o.customer_id = 42
14. Four Rules
• Everything on its own line
• Put commas, “AND” at the beginning of lines, not
the end
• When joining tables, use short, meaningful aliases
for tables and always prefix columns
• Avoid parentheses in WHERE clauses unless required
to nest “OR” expressions.
15. The Six Habits of
Legible SQL
Seven habits to adopt when writing SQL that
will improve legibility
16. SELECT o.order_id
, ol.order_line
, p.product_name
, ol.unit_price
, p.supplier_name
, SUM(ol.total_units)
, sum(p.unit_price*ol.total_units)
FROM order o
, product p
, order_items oi
WHERE oi.product_id = p.product_id
and o.order_id = oi.order_id
and o.customer_id = 42
17. Pick a case and stick to it
Upper or lower, it doesn’t matter, provided you’re consistent.
Case provides visual cues about the purpose or meaning of
the various parts of your statement.
18. SELECT o.order_id
, ol.order_line
, p.product_name
, ol.unit_price
, p.supplier_name
, SUM(ol.total_units)
, SUM(p.unit_price*ol.total_units)
FROM order o
, product p
, order_items oi
WHERE oi.product_id = p.product_id
and o.order_id = oi.order_id
and o.customer_id = 42
19. Use white space to align
statements for meaning
White space makes the parts and purpose of a statement
more visually apparent and easier to read.
20. SELECT o.order_id
, ol.order_line
, p.product_name
, ol.unit_price
, p.supplier_name
, SUM(ol.total_units)
, SUM(p.unit_price*ol.total_units)
FROM order o
, product p
, order_items oi
WHERE oi.product_id = p.product_id
AND o.order_id = oi.order_id
AND o.customer_id = 42
GROUP BY o.order_id
, ol.order_line
...
21. SELECT o.order_id
, ol.order_line
, p.product_name
, ol.unit_price
, p.supplier_name
, SUM(ol.total_units)
, SUM(p.unit_price*ol.total_units)
FROM order o
, product p
, order_items oi
WHERE oi.product_id = p.product_id
AND o.order_id = oi.order_id
AND o.customer_id = 42
GROUP BY o.order_id
, ol.order_line
...
22. Use white space to align
operands and aliases
When all operands fall into alignment, it’s much easier to see the
left and right sides in the WHERE statement. Likewise, when table
aliases are aligned it’s much easier to reference them.
23. SELECT o.order_id
, ol.order_line
, p.product_name
, ol.unit_price
, p.supplier_name
, SUM(ol.total_units)
, SUM(p.unit_price*ol.total_units)
FROM order o
, product p
, order_items oi
WHERE oi.product_id = p.product_id
AND o.order_id = oi.order_id
AND o.customer_id = 42
GROUP BY o.order_id
, ol.order_line
...
24. Group columns by table
If you need to troubleshoot by commenting out a table, it’s
more efficient when everything is together.
25. SELECT o.order_id
, p.product_name
, p.supplier_name
, ol.order_line
, ol.unit_price
, SUM(ol.total_units)
, SUM(p.unit_price*ol.total_units)
FROM order o
, product p
, order_items oi
WHERE oi.product_id = p.product_id
AND o.order_id = oi.order_id
AND o.customer_id = 42
GROUP BY o.order_id
, ol.order_line
...
26. In a WHERE clause,
equalities first, then IN lists,
and subqueries last.
This orders statements from the least to most likely to cause a
problem and need to be edited.
27. SELECT o.order_id
, p.product_name
, p.supplier_name
, ol.order_line
, ol.unit_price
, SUM(ol.total_units)
, SUM(p.unit_price*ol.total_units)
FROM order o
, product p
, order_items oi
WHERE oi.product_id = p.product_id
AND o.order_id = oi.order_id
AND o.customer_id = 42
GROUP BY o.order_id
, ol.order_line
...
28. SELECT o.order_id
, p.product_name
, p.supplier_name
, ol.order_line
, ol.unit_price
, SUM(ol.total_units)
, SUM(p.unit_price*ol.total_units)
FROM order o
INNER JOIN order_items oi
ON o.order_id = oi.order_id
INNER JOIN product p
ON oi.product_id = p.product_id
WHERE o.customer_id = 42
GROUP BY o.order_id
, ol.order_line
...
29. Place aggregate functions
last in the SELECT
It makes the GROUP BY easier to write—just copy/paste the
SELECT clause up to the aggregates as the GROUP BY.
30. SELECT o.order_id
, p.product_name
, p.supplier_name
, ol.order_line
, ol.unit_price
, SUM(ol.total_units)
, SUM(p.unit_price*ol.total_units)
FROM order o
, product p
, order_items oi
WHERE oi.product_id = p.product_id
AND o.order_id = oi.order_id
AND o.customer_id = 42
GROUP BY o.order_id
, ol.order_line
...
31. Six Habits of Legible SQL
• Pick a case and stick to it
• Use white space to align statements for meaning
• Use white space to align operands and aliases
• Group columns by table
• In a WHERE clause, equalities first, then IN lists, and
subqueries last
• Place aggregate functions last in the SELECT
Editor's Notes
SQL has very few rules.Capitalization doesn’t matter in many cases, MySQL being a notable exception.White space is meaningless.I located several ETL queries in my work library that exceeded 500 lines. I can’t imagine being able to understand or work on anything at that scale without some rules for writing and formatting the SQL, but outside of the order you write the statements, SQL is almost rule free. Everyone seems to do it differently, and you can see that if you go just to the web where you’ll find a plethora of styles.My method of writing SQL has evolved over a 25 year career. It’s based on having written a lot of SQL, but more importantly on having been a consultant for many years, specializing in part in troubleshooting performance issues. Usually, that means looking at a piece of code that I’ve never seen before and trying to figure out what it does so I can understand why it’s not performing well. Hundreds of lines of ugly, poorly formatted SQL for some ETL or data warehouse query are almost always going to be tough to read, so I needed to come up with a way of making it more palatable.This method is centered on making SQL legible and easy to troubleshoot. It’s version agnostic, so you can use it in Oracle, MySQL, SQL Server, Postgress, etc. It’s quite simple and makes a ton of sense to me.I write almost exclusively in a simple text editor, directly in vi, or at the command line. I’m not personally a fan of GUI or object editors, but these same methods will work in tools like Enterprise Manager, SQL Developer, TOAD, or whatever else you may like to use.
This is a fairly simple piece of SQL that I pulled off of a web site. The formatting seen here isn’t unusual.QUESTIONS:What is this doing?Isit easy to read?To understand?If you had to work on this, either to add to it or fix something that was broken, what’s the first thing you’d do?
My first rule is that everything gets it’s own line. Yes, it will make the statement longer but it has important benefits.
A typical troubleshooting scenario for a complex query involves breaking it down, often by commenting out individual tables to see where a problem is introduced. When everything is run together it’s much more difficult to remove individual columns and tables and parts of the WHERE clause.At this point, I’ve removed the GROUP BY and ORDER BY clauses simply because it will be more readable from here on out.If we were having some issue with the order_itemstable and wanted to comment it out, it is somewhat easier.
Notice that I had to comment out some columns in the select, from and where clauses. I also had to comment out the comma on the end of the supplier name, the comma after product in the from clause, add an “and” before the customer ID because of the lost and at the end of the order_id equijoin, and I had to add that comment in between two parenthesis to keep them balanced.That’s kind of a pain in the neck, and it leads to the next rule.
Don’t end a line with a comma or an and. Instead, start the next line with the comma or and.That would make the original statement look like this:
…and now if we wanted to comment out those same lines to get rid of the order_items table, it’s a lot less work:
I did still have to add a parenthesis at the beginning of the last line to keep them balanced. We’ll get to that later.Between putting everything on one line and moving “and”s and commas to the beginning of lines, you eliminate effort spent commenting out lines when you’re in troubleshooting mode. It’s just a lot less work and it’ll make your life much easier, especially when you’re dealing with a query more complex than this.Remember that we also had GROUP BY and ORDER BY clauses, and the same thing goes for them as well.One of my pet peeves is when someone assumes that whoever comes after them is going to have intimate understanding of their schema, and they leave out column aliases. Looking at this, it’s not obvious what table any of this belongs to.
Rule 3: When joining tables, use short, meaningful aliases for tables and always prefix columns.Let’s do that now and get rid of the comments...
At this point, things are a lot more obvious. This appears a lot less cluttered, and if this were even slightly longer it’s going to save a ton of typing all of those table names in favor of short aliases.We’re also being good coding citizens by including a basic level of documentation within the code itself. All columns are aliased, meaning there’s no need to refer to a data dictionary to see where unaliased columns came from.One last rule. In this query, the parentheses have absolutely no function other than to make the query confusing.
Rule 4: Avoid parentheses in your where clause unless you absolutely need them to nest “OR” expressions. Call it minimalist SQL.The only time you should ever have parentheses in your WHERE clause (outside of enclosing a functions, INLISTs or subqueries, obviously) is when they’re necessary to enforce the order in which a statement is evaluated. In a where clause, that’s when you have an “OR”.No OR? No parentheses. Let’s fix this statement…
Summary of the four rules.
In addition to the four rules, there are six good habits for writing SQL that will improve legibility and make your life easier. These are not hard-and-fast rules and there is flexibility in how you apply them, but I do consider them central to this method.
Pick a case and stick to it. If you want to use upper case for keywords like SELECT and FROM, go for it. This is purely personal preference. The same goes for objects.I personally put everything in lower case. It seems less like shouting and it means that literals “where a.b = ‘SOME STRING’” stand out a little more, since they are often upper case values. However, whichever case you personally choose, others will take visual cues from it provided you’re consistent.
Designers use white space to improve legibility and assign visual meaning to layouts. SQL benefits from the same technique.SQL is easier to read if everything lines up.
Here, I’ve aligned the queryalong a gutter by right aligning the statements and left aligning objects. This creates a nice visual flow for your eye. It sees the statements (SELECT, FROM, WHERE) as being important, and the objects all have similar visual meaning and importance to the right.I personally like to indent everything on the GROUP BY/ORDER BY which means all of my objects start in column 10. I personally find that to be easier to read.Other people like to left align everything, such as:
Whatever technique you prefer. Again, consistency is the most important thing.
Just as white space gives meaning and importance to the statements and objects in your query, it has a similar effect on the aliases and operands in your FROM and WHERE clauses.
This provides a nice, visual cue that separates aliases from tables and makes them easy to find, as well as separating the two sides of equality tests.With the aliases aligned, it’s easy for your eye to fall directly on them as a reference. Likewise, when reading the WHERE clause, it’s much easier to see and differentiate the left and right sides of the individual conditions.
I recommend grouping columns by table if possible, so that if I do need to troubleshoot and comment out a particular table, everything is all together.SQL is often being fed to some other language that will reorder and format the columns so the actual order they appear in your statement isn’t always important.
This allows you to comment out the columns associated with a particular table as a single block, as opposed to hunting/pecking through the code.
If you’re old fashioned, your where clause should start with equalities needed to satisfy the table joins, followed by test (equality, inequality) conditions, followed by in lists, followed by subqueries.This order is going from typically less to more problematic. I recommend putting the equalities and inequalities first, because they’re the things that are likely to be hard and fast tests that won’t change and are less likely to be removed.Betweens and IN lists are ranges and pseudo-equalities, and are a little more likely to be edited or removed for troubleshooting. These are also visually heavier and sometimes less critical to how the query works.Subqueries are usually the most problematic to the performance or accuracy of a query. Putting them at the end just makes it less work to cut and paste them out of the middle of a query. It’s also easier to copy/paste the top part of a query into into a terminal window, or highlight the SQL up to the subquery and run just that part.
For those that want to see this as ANSI joins.
I suggest putting aggregate functions last in the select.Aggregates are more likely to be the source of a problem than non-aggregates, so this makes them easier to comment out if necessary. It also means that building the GROUP BY clause is as simple as copying up to the aggregates and then pasting that into the GROUP BY.