Keys & Constraints

1.10. Keys & Constraints#

Last Updated: September 11, 2025

1.10.1. Constraints#

Constraints are relationships across data elements that the data system is required to preserve or enforce. Constraints help encode domain knowledge that the system can enforce, or act as guardrails and provide safety. Constraints are checked on writes (INSERT, UPDATE, DELETE); if a write would violate a constraint, the statement fails and no changes are applied.

In this note we’ll discuss key constraints (PRIMARY KEY, UNIQUE), referential integrity (FOREIGN KEY), and column-level rules (NOT NULL, DEFAULT, CHECK).

1.10.1.1. Keys#

Tuples can have keys, which make it easy to identify them in a relation. For example, a table of students could have a student ID key. A set of attributes is said to form a key if no two tuples can have the same values for that combination of attributes. We call this a PRIMARY KEY, defined in a table definition:

CREATE TABLE stops(
  stop_id INTEGER,
  race VARCHAR(10),
  location VARCHAR(20),
  age INTEGER,
  arrest BOOLEAN,

  PRIMARY KEY (stop_id)
);

When we define a primary key, we are also defining a constraint on its uniqueness. For example, in the above statement, each tuple must have a unique stopID—otherwise when inserting a tuple with a duplicate stopID into the table, we’ll get an error. Additionally, no attribute of a PRIMARY KEY can be NULL.

There can be only one PRIMARY KEY (hence the name!). However, we can use multiple columns as the primary key (a composite key). In that case, the combination of those columns must be unique, and none of them can be NULL.

CREATE TABLE enrollments (
  student_id INTEGER,
  course_id  INTEGER,
  term       CHAR(6),

  PRIMARY KEY (student_id, course_id, term)
);

This enforces that the combination of (student_id, course_id, term) is unique for each row, and none of those columns may be NULL.

What if we want multiple unique keys? We can use the UNIQUE key syntax:

CREATE TABLE stops (
  stop_id INTEGER,
  person_id INTEGER,
  stop_time TIMESTAMPTZ,
  race VARCHAR(10),
  location VARCHAR(20),
  age INTEGER,
  arrest BOOLEAN,

  PRIMARY KEY (stop_id),
  UNIQUE (person_id, stop_time)
);

Here, there are effectively 3 unique keys: stopID, personID, stopTime.

How PRIMARY KEY differs from UNIQUE: a table can have many UNIQUE constraints but only one PRIMARY KEY; PRIMARY KEY columns are implicitly NOT NULL, while UNIQUE columns may be NULL.

We could also declare attributes to have default values or not be null, e.g.,:

CREATE TABLE stops(
  stop_id INTEGER,
  person_id INTEGER,
  stop_time TIMESTAMPTZ,
  race VARCHAR(10),
  location VARCHAR(20) NOT NULL,
  age INTEGER,
  arrest BOOLEAN DEFAULT False,

  PRIMARY KEY (stop_id),
  UNIQUE (person_id, stop_time)
);

This gives us, so far, multiple different kinds of constraints—uniqueness via PRIMARY KEY or UNIQUE, and attribute-based constraints (DEFAULT, NOT NULL).

The final constraint type we’ll cover is FOREIGN KEY. Referential integrity is ensuring that records do not violate the constraints defined between tables, i.e., every reference in a “child” table must point to an existing row in a “parent” table. A foreign key maintains the referential integrity of your data, by restricting record insertion based on whether an attribute value in one table exists in another table.

Suppose we have two tables, an actors table and a cast_info table. We’ll have actor IDs in the cast_info table, but we want them to reference the actor IDs in the actor table. We can declare this constraint as follows:

CREATE TABLE actors (
  id INTEGER,
  name TEXT,

  PRIMARY KEY id
);

CREATE TABLE cast_info (
  person_id INTEGER,
  movie_id INTEGER,

  FOREIGN KEY (person_id) REFERENCES actors (id),
  -- If we similarly have a movies table, we'd include this line:
  FOREIGN KEY (movie_id) REFERENCES movies (id),
);

It is okay for foreign keys to not contain certain primary keys, but foreign keys must always reference an existing primary key. So, if we try to insert a tuple in cast_info that has a person_id that isn’t an id in the actor table, we’ll get an error. There is also a ‘dangling tuples’ violation—If we delete a tuple in the actor table that has an id in the cast_info table, we’ll also get an error. SQL handles this violation in 3 different ways. It either throws an error, makes the same changes to the cast_info table as made in the actor table, or changes the key in cast_info table to be NULL.

An attribute-based constraint is where every value in a column satisfies a Boolean expression.

For example, NOT NULL is an attribute-based constraint! Boolean expressions can also be expressed via CHECK. We can declare this constraint as follows:

-- Ensures that actor_ids are four-digit integers.
CREATE TABLE actors (
  id INTEGER CHECK (id >= 1000 AND id <= 9999),
  name TEXT
);

Attribute-based constraints are checked on changes to the relation itself. For example, for the table above, this constraint is checked when a new id is added or an existing one is updated, but not when an id is deleted. A limitation of this is that it only enforces one of the two potential referential integrity violations (see last note for more on these violations), because CHECK only enforces constraints on a single table, not across tables.

Keys & Constraints

Contents

1.10. Keys & Constraints#

1.10.1. Constraints#

1.10.1.1. Keys#