Truncating Relations

1.3. Truncating Relations#

Last updated: August 27, 2024

Sometimes you may want just a portion of rows generated by a query.

1.3.1. LIMIT and OFFSET#

Sometimes we want to limit the result to a few tuples via LIMIT <val>. For example, say we want to order Stops by descending age, return top 15:

SELECT *
FROM Stops
ORDER BY age DESC LIMIT 15;

Sometimes we want to start the output at a particular point via OFFSET <val>. For example, say we want to order Stops by descending age, return positions 11 to 15:

SELECT *
FROM Stops
ORDER BY age DESC
LIMIT 5
OFFSET 10;

1.3.2. Sampling#

Note

[09/02/2024] The full section is here for your convenience, but for Lectures 02 / Project 1 you are likely only expected to know the first option. We will learn more about TABLESAMPLE around Lecture 07 once you learn memory structures.

We can also sample tuples, or select a subset of tuples. This is desirable when the original dataset might be too large, and you want to experiment quickly before running operations on the full dataset. There are 3 sampling methods discussed in this class:

  • SELECT * FROM Table ORDER BY RANDOM() LIMIT N: randomly sorts the rows of the table and takes the first \(n\) rows.

  • SELECT * FROM Table TABLESAMPLE BERNOULLI(percentage p): takes a \(p\%\) uniform random sample based on flipping a p%-probability coin

  • SELECT * FROM Table TABLESAMPLE SYSTEM(percentage p): tuples are grouped in pages on disk, so \(p\%\) of pages are uniformly randomly selected

RANDOM is expensive for large tables, because we must first compute all resulting rows, then sort all the rows before returning a truncated subset. While BERNOULLI is faster than RANDOM, it is slower than SYSTEM, due to more random accesses. SYSTEM is therefore the fastest method, but the least “random” due to page-level sampling.