SQL Style Guide

SQL Style Guide#

Like all programming languages, SQL is incredibly flexible. However, following some best practices will keep your code easier to follow, and your database easier to maintain.

Note

All style guides enforce some matter of taste and opinion. Naturally, we think these “rules” are a good starting point. However, when you are on a project working with a team, clarity and consistency are the most important goals. It’s generally best to adopt the style of the existing projects.

Data Modeling and Database Design#

Before we begin to talk about syntax, the schema design will have the biggest long-term impact on your query readability.

Table and Attribute Names#

Preferred:

Name tables as the plural of the object.
- Example: movies over movie.
- This is because a table is, naturally, a collection of multiple records.
Use allow lowercase indentifiers.
- Example: name over NAME or Name
- This is because different variants of SQL handle capitalization differently.
Prefer ‘Snake_Case’ over ‘camelCase’ for multi-word names.
- ‘Snake Case’ (like most Python defaults) means separating words with an underscore.
- Example: first_name over firstName
- This improves reability, and avoids the case sensitivity issues in some SQL variants.
Name “join tables” with a term representing the idea, not the table names.
- “Join tables” are those that link relationships between two tables.
- Example: Represent “enrollments” in a course, you should name a table course_enrollments rather than courses_students.
- Example: Prefer cast over actors_movies in an database of

Avoids:

Do not name columns and tables with the same names
- Example: In a table of managers, do not have a column called manager or managers.
Do note use acronyms or abbreviations unless necessary.
- Example: Use student_id over sid.
- Explicit column names are much more readable. If you need to shorten a long term, make sure the abbreviation is as obvious as possible, and that it is spelled consistently across all tables.
- Within a query, using aliasing (AS) to shorten a column or table name.

Attributes and Types#

Learn the datatypes that your DBMS offers. PostgreSQL offers many built-in data types. Chosing the best datatype can help improve performance and avpod data inconsistency errors.

Every table should have an explicit primary key!
- Often times, this is a newly generated numeric value, but a UUID is a good option too.
- In Postgres, INTEGER, BIGINTEGER, SERIAL, BIGSERIAL and UUID are the most common primary key types.
- When you have additional data that is unique, use UNQIUE constraints rather than a primary key.
- Why? You never know when an external piece of information might change. Referencing primary keys that only exist in the database means that if an email address, student ID, product ID, etc. changes, then you will not need to manually maintain any foreign key relationships.
Use id to name primary keys and foreign keys.
- Most often, the primary of a table simply should be id
Be consistent naming boolean attributes
- It’s often good to ‘prefix’ a boolean attribute that could be ambiguous, like ‘student’, or ‘instructor’
- Example: Use is_admin, is_... (is_student'), or has_... (has_won_awards)

Dates, Times, and Timestamps#

Dates and times are some of the trickiest things to get right. Partially because it’s incredibly difficult to reconcile bad data.

You may make a lot of assumptions about how dates and times work in your life today, but many of these assumptions are not guaranteed to be true. Not everywhere observes daylight savings time, for example. And while we often think timezones are hour long blocks, there are some 15 minute timezone boundaries!

Where possible use the column type TIMESTAMP WITH TIME ZONE or TIMESTAMPTZ (in Postgres).
- This datatype accept a complete timestamp with a UTC offset, and always ensures the date is converted back to UTC time in internal storage.
- TIMESTAMPTZ is simply Postgres-specific shorthand, but it is interchangable with the full SQL name.
Always store timestamps with consistent timezones.
- Most often you should store timestamps in UTC time, and ensure the timestamp is explicitly UTC time.
- When presenting data to users, or aggregrating it across reports, then convert to the local timezone.
If you need to know the origianl timezone (e.g. the location or specific rules), then you should store that separately.
When naming columns which are timestamps, it’s helpful to use the _at suffic to indicate an event, e.g. created_at, updated_at, last_saved_at… Without the _at, it can be easy to make these columns sound like booleans.

Query Syntax#

Good structure for your SQL queries will help make them easier to read.

SQL Keywords#

Use ALLCAPS for all built-in SQL keywords, like SELECT, FROM, CREATE TABLE etc.
Ideally, start ‘components’ of a query on a new line:

Intendation#

Align keywords to the left of the line.
Indent the contents of subqueries and CTEs, typically by 2 spaces.
- Place the opening ( on the first line, and the closing ) on its own line.

Miscelleanous#

Always remember end queries with a ;, even in jupysql where this is handled for you.
Use AS or aliasing for clarity within a query.
- This is the best way to abbreviate long names.

For example:

WITH some_table AS (
  SELECT column1, column2, COUNT(*) AS my_count
  FROM another_table
)
SELECT *
FROM some_table
JOIN big_table as bt on bt.id = some_table.big_table_id
WHERE bt.some_value > 5;