SQL Style Guide#
Like all programming languages, SQL is incredibly flexible. However, following some best practices will keep your code easier to follow, and your database easier to maintain.
Note
All style guides enforce some matter of taste and opinion. Naturally, we think these “rules” are a good starting point. However, when you are on a project working with a team, clarity and consistency are the most important goals. It’s generally best to adopt the style of the existing projects.
Data Modeling and Database Design#
Before we begin to talk about syntax, the schema design will have the biggest long-term impact on your query readability.
Table and Attribute Names#
Preferred:
Name tables as the plural of the object.
Example:
movies
overmovie
.This is because a table is, naturally, a collection of multiple records.
Use allow lowercase indentifiers.
Example:
name
overNAME
orName
This is because different variants of SQL handle capitalization differently.
Prefer ‘Snake_Case’ over ‘camelCase’ for multi-word names.
‘Snake Case’ (like most Python defaults) means separating words with an underscore.
Example:
first_name
overfirstName
This improves reability, and avoids the case sensitivity issues in some SQL variants.
Name “join tables” with a term representing the idea, not the table names.
“Join tables” are those that link relationships between two tables.
Example: Represent “enrollments” in a course, you should name a table
course_enrollments
rather thancourses_students
.Example: Prefer
cast
overactors_movies
in an database of
Avoids:
Do not name columns and tables with the same names
Example: In a table of
managers
, do not have a column calledmanager
ormanagers
.
Do note use acronyms or abbreviations unless necessary.
Example: Use
student_id
oversid
.Explicit column names are much more readable. If you need to shorten a long term, make sure the abbreviation is as obvious as possible, and that it is spelled consistently across all tables.
Within a query, using aliasing (
AS
) to shorten a column or table name.
Attributes and Types#
Learn the datatypes that your DBMS offers. PostgreSQL offers many built-in data types. Chosing the best datatype can help improve performance and avpod data inconsistency errors.
Every table should have an explicit primary key!
Often times, this is a newly generated numeric value, but a UUID is a good option too.
In Postgres,
INTEGER
,BIGINTEGER
,SERIAL
,BIGSERIAL
andUUID
are the most common primary key types.When you have additional data that is unique, use
UNQIUE
constraints rather than a primary key.Why? You never know when an external piece of information might change. Referencing primary keys that only exist in the database means that if an email address, student ID, product ID, etc. changes, then you will not need to manually maintain any foreign key relationships.
Use
id
to name primary keys and foreign keys.Most often, the primary of a table simply should be
id
Be consistent naming boolean attributes
It’s often good to ‘prefix’ a boolean attribute that could be ambiguous, like ‘student’, or ‘instructor’
Example: Use
is_admin
,is_...
(is_student'
), orhas_...
(has_won_awards
)
Dates, Times, and Timestamps#
Dates and times are some of the trickiest things to get right. Partially because it’s incredibly difficult to reconcile bad data.
You may make a lot of assumptions about how dates and times work in your life today, but many of these assumptions are not guaranteed to be true. Not everywhere observes daylight savings time, for example. And while we often think timezones are hour long blocks, there are some 15 minute timezone boundaries!
Where possible use the column type
TIMESTAMP WITH TIME ZONE
orTIMESTAMPTZ
(in Postgres).This datatype accept a complete timestamp with a UTC offset, and always ensures the date is converted back to UTC time in internal storage.
TIMESTAMPTZ
is simply Postgres-specific shorthand, but it is interchangable with the full SQL name.
Always store timestamps with consistent timezones.
Most often you should store timestamps in UTC time, and ensure the timestamp is explicitly UTC time.
When presenting data to users, or aggregrating it across reports, then convert to the local timezone.
If you need to know the origianl timezone (e.g. the location or specific rules), then you should store that separately.
When naming columns which are timestamps, it’s helpful to use the
_at
suffic to indicate an event, e.g.created_at
,updated_at
,last_saved_at
… Without the_at
, it can be easy to make these columns sound like booleans.
Query Syntax#
Good structure for your SQL queries will help make them easier to read.
SQL Keywords#
Use ALLCAPS for all built-in SQL keywords, like
SELECT
,FROM
,CREATE TABLE
etc.Ideally, start ‘components’ of a query on a new line:
Intendation#
Align keywords to the left of the line.
Indent the contents of subqueries and CTEs, typically by 2 spaces.
Place the opening
(
on the first line, and the closing)
on its own line.
Miscelleanous#
Always remember end queries with a
;
, even in jupysql where this is handled for you.Use
AS
or aliasing for clarity within a query.This is the best way to abbreviate long names.
For example:
WITH some_table AS (
SELECT column1, column2, COUNT(*) AS my_count
FROM another_table
)
SELECT *
FROM some_table
JOIN big_table as bt on bt.id = some_table.big_table_id
WHERE bt.some_value > 5;