Lecture 08: Query Optimization I¶

New IMDB Performance database¶

This is a variation of the IMDB database with keys defined. Note that this is a pretty big database! So if you run the below lines, please also remember to delete the imdb_perf_lecture afterwards to save space on your limited postgreSQL server.

If you didn't load it in with a previous lecture, load in the imdb_perf_lecture database.

In [3]:
!unzip -u ../lec07/data/imdb_perf_lecture.zip -d ../lec07/data/
Archive:  ../lec07/data/imdb_perf_lecture.zip
In [4]:
!psql -h localhost -c 'DROP DATABASE IF EXISTS imdb_perf_lecture'
!psql -h localhost -c 'CREATE DATABASE imdb_perf_lecture' 
!psql -h localhost -d imdb_perf_lecture -f ../lec07/data/imdb_perf_lecture.sql
DROP DATABASE
CREATE DATABASE
SET
SET
SET
SET
SET
 set_config 
------------
 
(1 row)

SET
SET
SET
SET
SET
SET
CREATE TABLE
psql:../lec07/data/imdb_perf_lecture.sql:33: ERROR:  role "yanlisa" does not exist
CREATE TABLE
psql:../lec07/data/imdb_perf_lecture.sql:45: ERROR:  role "yanlisa" does not exist
CREATE TABLE
psql:../lec07/data/imdb_perf_lecture.sql:59: ERROR:  role "yanlisa" does not exist
COPY 845888
COPY 2211936
COPY 656453
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
In [7]:
%reload_ext sql
In [8]:
%sql postgresql://127.0.0.1:5432/imdb_perf_lecture

Demo¶


Example 1

In [9]:
%sql EXPLAIN ANALYZE SELECT * FROM Actor;
Running query in 'postgresql://127.0.0.1:5432/imdb_perf_lecture'
3 rows affected.
Out[9]:
QUERY PLAN
Seq Scan on actor (cost=0.00..13684.88 rows=845888 width=18) (actual time=0.047..56.311 rows=845888 loops=1)
Planning Time: 0.275 ms
Execution Time: 80.765 ms

Notice ANALYZE:

  • start time and end time for operator, rows processed
  • loops = number of times the operator is executed

Notice EXPLAIN:

  • width = size (in bytes) of output tuples from that operator

Example 2: just planning (no execution)

In [10]:
%sql EXPLAIN SELECT * FROM Actor;
Running query in 'postgresql://127.0.0.1:5432/imdb_perf_lecture'
1 rows affected.
Out[10]:
QUERY PLAN
Seq Scan on actor (cost=0.00..13684.88 rows=845888 width=18)

Example 3

In [11]:
%sql EXPLAIN ANALYZE SELECT id FROM Actor;
Running query in 'postgresql://127.0.0.1:5432/imdb_perf_lecture'
3 rows affected.
Out[11]:
QUERY PLAN
Seq Scan on actor (cost=0.00..13684.88 rows=845888 width=4) (actual time=0.063..79.801 rows=845888 loops=1)
Planning Time: 0.085 ms
Execution Time: 106.623 ms

Notice:

  • width from 18 down to 4!
  • still 845k output tuples

Example 4:

In [13]:
%sql EXPLAIN ANALYZE SELECT id FROM Actor WHERE id > 4000000;
Running query in 'postgresql://127.0.0.1:5432/imdb_perf_lecture'
5 rows affected.
Out[13]:
QUERY PLAN
Index Only Scan using actor_pkey on actor (cost=0.42..12681.33 rows=445080 width=4) (actual time=0.104..52.051 rows=444781 loops=1)
  Index Cond: (id > 4000000)
  Heap Fetches: 0
Planning Time: 0.077 ms
Execution Time: 64.745 ms

Notice:

  • output tuples now reduced to 444k
  • planning has imperfect estimate (445080) vs actual (444781)

Example 5:

In [14]:
%sql EXPLAIN ANALYZE SELECT id, name FROM Actor WHERE id > 4000000;
Running query in 'postgresql://127.0.0.1:5432/imdb_perf_lecture'
5 rows affected.
Out[14]:
QUERY PLAN
Seq Scan on actor (cost=0.00..15799.60 rows=445080 width=18) (actual time=0.274..71.002 rows=444781 loops=1)
  Filter: (id > 4000000)
  Rows Removed by Filter: 401107
Planning Time: 0.075 ms
Execution Time: 84.819 ms

Notice:

  • width back to 18 (because of name)

Matching Exercise¶

In [16]:
%%sql
EXPLAIN ANALYZE -- 1
SELECT id
FROM Actor
WHERE id > 4000000 AND name='Tom Hanks';
Running query in 'postgresql://127.0.0.1:5432/imdb_perf_lecture'
8 rows affected.
Out[16]:
QUERY PLAN
Gather (cost=1000.00..11512.90 rows=1 width=4) (actual time=25.958..28.263 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  -> Parallel Seq Scan on actor (cost=0.00..10512.80 rows=1 width=4) (actual time=23.245..23.246 rows=0 loops=3)
        Filter: ((id > 4000000) AND (name = 'Tom Hanks'::text))
        Rows Removed by Filter: 281963
Planning Time: 0.098 ms
Execution Time: 28.282 ms
In [18]:
%%sql
EXPLAIN ANALYZE -- 2
SELECT id
FROM Actor
WHERE id < 4000000 AND name='Tom Hanks';
UsageError: Unrecognized argument(s): --
If you need help solving this issue, send us a message: https://ploomber.io/community
In [19]:
%%sql
EXPLAIN ANALYZE -- 3
SELECT id
FROM Actor;
Running query in 'postgresql://127.0.0.1:5432/imdb_perf_lecture'
3 rows affected.
Out[19]:
QUERY PLAN
Seq Scan on actor (cost=0.00..13684.88 rows=845888 width=4) (actual time=0.073..80.670 rows=845888 loops=1)
Planning Time: 0.059 ms
Execution Time: 107.513 ms
In [20]:
%%sql
EXPLAIN ANALYZE -- 4
SELECT id
FROM Actor
LIMIT 10;
Running query in 'postgresql://127.0.0.1:5432/imdb_perf_lecture'
4 rows affected.
Out[20]:
QUERY PLAN
Limit (cost=0.00..0.16 rows=10 width=4) (actual time=0.017..0.019 rows=10 loops=1)
  -> Seq Scan on actor (cost=0.00..13684.88 rows=845888 width=4) (actual time=0.016..0.017 rows=10 loops=1)
Planning Time: 0.051 ms
Execution Time: 0.029 ms
In [22]:
%config SqlMagic.displaylimit = None
displaylimit: Value None will be treated as 0 (no limit)
displaylimit: Value None will be treated as 0 (no limit)
In [23]:
%%sql
EXPLAIN ANALYZE -- 5
SELECT id
FROM Actor
ORDER BY name
LIMIT 10;
Running query in 'postgresql://127.0.0.1:5432/imdb_perf_lecture'
12 rows affected.
Out[23]:
QUERY PLAN
Limit (cost=17366.94..17368.11 rows=10 width=18) (actual time=116.900..119.783 rows=10 loops=1)
  -> Gather Merge (cost=17366.94..99611.72 rows=704906 width=18) (actual time=116.897..119.779 rows=10 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        -> Sort (cost=16366.92..17248.05 rows=352453 width=18) (actual time=114.470..114.471 rows=7 loops=3)
              Sort Key: name
              Sort Method: top-N heapsort Memory: 26kB
              Worker 0: Sort Method: top-N heapsort Memory: 26kB
              Worker 1: Sort Method: top-N heapsort Memory: 25kB
              -> Parallel Seq Scan on actor (cost=0.00..8750.53 rows=352453 width=18) (actual time=0.020..58.439 rows=281963 loops=3)
Planning Time: 0.062 ms
Execution Time: 119.808 ms
In [ ]: