Lecture 08¶
# Run this cell to set up imports
import numpy as np
import pandas as pd
Load in the IMDB Performance database¶
This is a variation of the IMDB database with keys defined. Note that this is a pretty big database! So if you run the below lines, please also remember to delete the imdb_perf_lecture
afterwards to save space on your limited postgreSQL server.
We assume you have the associated lecture folder lec06
pulled into your repo already. The below commands create a symbolic link (i.e., shortcut/redirect with ln
) to this lecture data directory, allowing some space saving, and unzip the database file.
!ln -sf ../../lec/lec06/data .
!unzip -u data/imdb_perf_lecture.zip -d data/
Archive: data/imdb_perf_lecture.zip
!psql -h localhost -c 'DROP DATABASE IF EXISTS imdb_perf_lecture'
!psql -h localhost -c 'CREATE DATABASE imdb_perf_lecture'
!psql -h localhost -d imdb_perf_lecture -f data/imdb_perf_lecture.sql
DROP DATABASE CREATE DATABASE SET SET SET SET SET set_config ------------ (1 row) SET SET SET SET SET SET CREATE TABLE ALTER TABLE CREATE TABLE ALTER TABLE CREATE TABLE ALTER TABLE COPY 845888 COPY 2211936 COPY 656453 ALTER TABLE ALTER TABLE ALTER TABLE ALTER TABLE
Start jupysql
¶
%reload_ext sql
%sql postgresql://127.0.0.1:5432/imdb_perf_lecture
If you're having trouble seeing the entirety of query plans, you can run the following cell to set the limit on displayed rows to 20. Careful: Do not set this to None
and run the actual queries; SQL will return millions of rows and crash your kernel!
# run this cell to remove 10-row limit on display
%config SqlMagic.displaylimit = 20
Matching¶
jupysql
dataframe visualization removes any whitespace.
You can also run (after each cell):
result = __.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
def printplans(x):
result = x.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
%%sql
/* 1 */
EXPLAIN ANALYZE SELECT id FROM actors
WHERE id > 4000000 AND
name='Tom Hanks';
QUERY PLAN |
---|
Gather (cost=1000.00..11512.90 rows=1 width=4) (actual time=24.124..26.703 rows=0 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Seq Scan on actors (cost=0.00..10512.80 rows=1 width=4) (actual time=21.578..21.579 rows=0 loops=3) |
Filter: ((id > 4000000) AND (name = 'Tom Hanks'::text)) |
Rows Removed by Filter: 281963 |
Planning Time: 0.065 ms |
Execution Time: 26.756 ms |
printplans(__)
QUERY PLAN 0 Gather (cost=1000.00..11512.90 rows=1 width=4... 1 Workers Planned: 2 2 Workers Launched: 2 3 -> Parallel Seq Scan on actors (cost=0.00.... 4 Filter: ((id > 4000000) AND (name = 'T... 5 Rows Removed by Filter: 281963 6 Planning Time: 0.065 ms 7 Execution Time: 26.756 ms
%%sql
/* 2 */
EXPLAIN ANALYZE
SELECT id FROM actors
ORDER BY name;
QUERY PLAN |
---|
Sort (cost=114312.51..116427.23 rows=845888 width=18) (actual time=1947.844..2768.135 rows=845888 loops=1) |
Sort Key: name |
Sort Method: external merge Disk: 23672kB |
-> Seq Scan on actors (cost=0.00..13684.88 rows=845888 width=18) (actual time=0.056..55.089 rows=845888 loops=1) |
Planning Time: 0.051 ms |
Execution Time: 2815.284 ms |
printplans(__)
QUERY PLAN 0 Sort (cost=114312.51..116427.23 rows=845888 w... 1 Sort Key: name 2 Sort Method: external merge Disk: 23672kB 3 -> Seq Scan on actors (cost=0.00..13684.88... 4 Planning Time: 0.051 ms 5 Execution Time: 2815.284 ms
%%sql
/* 3 */
EXPLAIN ANALYZE
SELECT id FROM actors
ORDER BY id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=0.42..0.68 rows=10 width=4) (actual time=0.022..0.024 rows=10 loops=1) |
-> Index Only Scan using actor_pkey on actors (cost=0.42..21980.74 rows=845888 width=4) (actual time=0.021..0.022 rows=10 loops=1) |
Heap Fetches: 0 |
Planning Time: 0.076 ms |
Execution Time: 0.037 ms |
printplans(__)
QUERY PLAN 0 Limit (cost=0.42..0.68 rows=10 width=4) (actu... 1 -> Index Only Scan using actor_pkey on acto... 2 Heap Fetches: 0 3 Planning Time: 0.076 ms 4 Execution Time: 0.037 ms
Two-table demo: LIMIT¶
Let's join two tables, actors
and cast_info
. The query planner selects a hash join:
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info
WHERE actors.id = cast_info.person_id;
QUERY PLAN |
---|
Hash Join (cost=29215.48..89168.21 rows=2211936 width=26) (actual time=171.285..1446.736 rows=2211936 loops=1) |
Hash Cond: (cast_info.person_id = actors.id) |
-> Seq Scan on cast_info (cost=0.00..31907.36 rows=2211936 width=8) (actual time=0.069..139.423 rows=2211936 loops=1) |
-> Hash (cost=13684.88..13684.88 rows=845888 width=18) (actual time=171.049..171.050 rows=845888 loops=1) |
Buckets: 65536 Batches: 16 Memory Usage: 3114kB |
-> Seq Scan on actors (cost=0.00..13684.88 rows=845888 width=18) (actual time=0.014..52.583 rows=845888 loops=1) |
Planning Time: 0.152 ms |
Execution Time: 1513.363 ms |
printplans(__)
QUERY PLAN 0 Hash Join (cost=29215.48..89168.21 rows=22119... 1 Hash Cond: (cast_info.person_id = actors.id) 2 -> Seq Scan on cast_info (cost=0.00..31907... 3 -> Hash (cost=13684.88..13684.88 rows=8458... 4 Buckets: 65536 Batches: 16 Memory Us... 5 -> Seq Scan on actors (cost=0.00..13... 6 Planning Time: 0.152 ms 7 Execution Time: 1513.363 ms
Below, we add LIMIT
. Note the query planner switches to a nested loop join, using an index scan to match cast_info.person_id
to the indexed attribute actors.id
! This results in a 10,000x speedup!
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info
WHERE actors.id = cast_info.person_id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=0.43..4.49 rows=10 width=26) (actual time=0.085..0.227 rows=10 loops=1) |
-> Nested Loop (cost=0.43..895896.13 rows=2211936 width=26) (actual time=0.084..0.225 rows=10 loops=1) |
-> Seq Scan on cast_info (cost=0.00..31907.36 rows=2211936 width=8) (actual time=0.051..0.052 rows=10 loops=1) |
-> Memoize (cost=0.43..0.47 rows=1 width=18) (actual time=0.017..0.017 rows=1 loops=10) |
Cache Key: cast_info.person_id |
Cache Mode: logical |
Hits: 2 Misses: 8 Evictions: 0 Overflows: 0 Memory Usage: 1kB |
-> Index Scan using actor_pkey on actors (cost=0.42..0.46 rows=1 width=18) (actual time=0.020..0.020 rows=1 loops=8) |
Index Cond: (id = cast_info.person_id) |
Planning Time: 0.166 ms |
Execution Time: 0.497 ms |
printplans(__)
QUERY PLAN 0 Limit (cost=0.43..4.49 rows=10 width=26) (act... 1 -> Nested Loop (cost=0.43..895896.13 rows=... 2 -> Seq Scan on cast_info (cost=0.00.... 3 -> Memoize (cost=0.43..0.47 rows=1 w... 4 Cache Key: cast_info.person_id 5 Cache Mode: logical 6 Hits: 2 Misses: 8 Evictions: 0... 7 -> Index Scan using actor_pkey ... 8 Index Cond: (id = cast_inf... 9 Planning Time: 0.166 ms 10 Execution Time: 0.497 ms
Two-table demo: Projection¶
jupysql
dataframe visualization removes any whitespace.
%%sql
EXPLAIN ANALYZE
SELECT name, movie_id
FROM actors, cast_info
WHERE actors.id = cast_info.person_id;
QUERY PLAN |
---|
Hash Join (cost=29215.48..89168.21 rows=2211936 width=18) (actual time=236.467..1300.556 rows=2211936 loops=1) |
Hash Cond: (cast_info.person_id = actors.id) |
-> Seq Scan on cast_info (cost=0.00..31907.36 rows=2211936 width=8) (actual time=0.009..205.463 rows=2211936 loops=1) |
-> Hash (cost=13684.88..13684.88 rows=845888 width=18) (actual time=236.305..236.307 rows=845888 loops=1) |
Buckets: 65536 Batches: 16 Memory Usage: 3191kB |
-> Seq Scan on actors (cost=0.00..13684.88 rows=845888 width=18) (actual time=0.063..83.934 rows=845888 loops=1) |
Planning Time: 0.145 ms |
Execution Time: 1363.600 ms |
printplans(__)
QUERY PLAN 0 Hash Join (cost=29215.48..89168.21 rows=22119... 1 Hash Cond: (cast_info.person_id = actors.id) 2 -> Seq Scan on cast_info (cost=0.00..31907... 3 -> Hash (cost=13684.88..13684.88 rows=8458... 4 Buckets: 65536 Batches: 16 Memory Us... 5 -> Seq Scan on actors (cost=0.00..13... 6 Planning Time: 0.145 ms 7 Execution Time: 1363.600 ms
The below is not as substantial a reduction, but still about a quarter speed-up.
- Notice that projection was pushed down below the join “at source”.
- If we waited until join was done, would be at least as expensive.
%%sql
EXPLAIN ANALYZE
SELECT name, movie_id
FROM actors, cast_info
WHERE actors.id = cast_info.person_id AND actors.id > 4000000;
QUERY PLAN |
---|
Hash Join (cost=23988.61..81598.34 rows=1166260 width=18) (actual time=156.138..1079.895 rows=634763 loops=1) |
Hash Cond: (cast_info.person_id = actors.id) |
-> Seq Scan on cast_info (cost=0.00..31907.36 rows=2211936 width=8) (actual time=0.009..208.164 rows=2211936 loops=1) |
-> Hash (cost=15799.60..15799.60 rows=446001 width=18) (actual time=155.629..155.631 rows=444781 loops=1) |
Buckets: 65536 Batches: 8 Memory Usage: 3335kB |
-> Seq Scan on actors (cost=0.00..15799.60 rows=446001 width=18) (actual time=0.265..76.297 rows=444781 loops=1) |
Filter: (id > 4000000) |
Rows Removed by Filter: 401107 |
Planning Time: 0.151 ms |
Execution Time: 1098.684 ms |
printplans(__)
QUERY PLAN 0 Hash Join (cost=23988.61..81598.34 rows=11662... 1 Hash Cond: (cast_info.person_id = actors.id) 2 -> Seq Scan on cast_info (cost=0.00..31907... 3 -> Hash (cost=15799.60..15799.60 rows=4460... 4 Buckets: 65536 Batches: 8 Memory Usa... 5 -> Seq Scan on actors (cost=0.00..15... 6 Filter: (id > 4000000) 7 Rows Removed by Filter: 401107 8 Planning Time: 0.151 ms 9 Execution Time: 1098.684 ms
Three-way joins¶
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=0.86..9.46 rows=10 width=56) (actual time=0.071..0.152 rows=10 loops=1) |
-> Nested Loop (cost=0.86..1901329.72 rows=2211936 width=56) (actual time=0.071..0.149 rows=10 loops=1) |
-> Nested Loop (cost=0.43..895896.13 rows=2211936 width=26) (actual time=0.023..0.077 rows=10 loops=1) |
-> Seq Scan on cast_info (cost=0.00..31907.36 rows=2211936 width=8) (actual time=0.008..0.010 rows=10 loops=1) |
-> Memoize (cost=0.43..0.47 rows=1 width=18) (actual time=0.006..0.006 rows=1 loops=10) |
Cache Key: cast_info.person_id |
Cache Mode: logical |
Hits: 2 Misses: 8 Evictions: 0 Overflows: 0 Memory Usage: 1kB |
-> Index Scan using actor_pkey on actors (cost=0.42..0.46 rows=1 width=18) (actual time=0.006..0.006 rows=1 loops=8) |
Index Cond: (id = cast_info.person_id) |
-> Index Scan using movie_pkey on movies (cost=0.42..0.45 rows=1 width=30) (actual time=0.006..0.007 rows=1 loops=10) |
Index Cond: (id = cast_info.movie_id) |
Planning Time: 0.533 ms |
Execution Time: 0.442 ms |
printplans(__)
QUERY PLAN 0 Limit (cost=0.86..9.46 rows=10 width=56) (act... 1 -> Nested Loop (cost=0.86..1901329.72 rows... 2 -> Nested Loop (cost=0.43..895896.13... 3 -> Seq Scan on cast_info (cost... 4 -> Memoize (cost=0.43..0.47 ro... 5 Cache Key: cast_info.perso... 6 Cache Mode: logical 7 Hits: 2 Misses: 8 Evicti... 8 -> Index Scan using actor... 9 Index Cond: (id = ca... 10 -> Index Scan using movie_pkey on mov... 11 Index Cond: (id = cast_info.movi... 12 Planning Time: 0.533 ms 13 Execution Time: 0.442 ms
Below, note the predicate pushdown in the sequential scan on actors! Again, copy-paste into psql
if you can't see the whitespace formatting.
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
AND name = 'Tom Hanks';
QUERY PLAN |
---|
Gather (cost=10632.10..32056.15 rows=3 width=56) (actual time=175.633..295.328 rows=65 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Nested Loop (cost=9632.10..31055.85 rows=1 width=56) (actual time=173.023..288.885 rows=22 loops=3) |
-> Parallel Hash Join (cost=9631.68..31055.40 rows=1 width=26) (actual time=172.939..288.279 rows=22 loops=3) |
Hash Cond: (cast_info.person_id = actors.id) |
-> Parallel Seq Scan on cast_info (cost=0.00..19004.40 rows=921640 width=8) (actual time=0.030..104.277 rows=737312 loops=3) |
-> Parallel Hash (cost=9631.67..9631.67 rows=1 width=18) (actual time=21.469..21.470 rows=0 loops=3) |
Buckets: 1024 Batches: 1 Memory Usage: 40kB |
-> Parallel Seq Scan on actors (cost=0.00..9631.67 rows=1 width=18) (actual time=13.520..21.411 rows=0 loops=3) |
Filter: (name = 'Tom Hanks'::text) |
Rows Removed by Filter: 281962 |
-> Index Scan using movie_pkey on movies (cost=0.42..0.45 rows=1 width=30) (actual time=0.027..0.027 rows=1 loops=65) |
Index Cond: (id = cast_info.movie_id) |
Planning Time: 0.257 ms |
Execution Time: 295.372 ms |
printplans(__)
QUERY PLAN 0 Gather (cost=10632.10..32056.15 rows=3 width=... 1 Workers Planned: 2 2 Workers Launched: 2 3 -> Nested Loop (cost=9632.10..31055.85 row... 4 -> Parallel Hash Join (cost=9631.68.... 5 Hash Cond: (cast_info.person_id ... 6 -> Parallel Seq Scan on cast_in... 7 -> Parallel Hash (cost=9631.67... 8 Buckets: 1024 Batches: 1 ... 9 -> Parallel Seq Scan on a... 10 Filter: (name = 'Tom... 11 Rows Removed by Filt... 12 -> Index Scan using movie_pkey on mov... 13 Index Cond: (id = cast_info.movi... 14 Planning Time: 0.257 ms 15 Execution Time: 295.372 ms
Compare with the below predicate pushdown, where the filter is now on movie titles:
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
AND title LIKE 'Snakes on a Plane';
QUERY PLAN |
---|
Gather (cost=9279.46..30704.82 rows=7 width=56) (actual time=325.931..329.236 rows=4 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Nested Loop (cost=8279.46..29704.12 rows=3 width=56) (actual time=222.206..323.329 rows=1 loops=3) |
-> Parallel Hash Join (cost=8279.04..29702.75 rows=3 width=38) (actual time=222.144..323.241 rows=1 loops=3) |
Hash Cond: (cast_info.movie_id = movies.id) |
-> Parallel Seq Scan on cast_info (cost=0.00..19004.40 rows=921640 width=8) (actual time=0.032..154.349 rows=737312 loops=3) |
-> Parallel Hash (cost=8279.03..8279.03 rows=1 width=30) (actual time=22.467..22.468 rows=0 loops=3) |
Buckets: 1024 Batches: 1 Memory Usage: 40kB |
-> Parallel Seq Scan on movies (cost=0.00..8279.03 rows=1 width=30) (actual time=17.099..22.391 rows=0 loops=3) |
Filter: (title ~~ 'Snakes on a Plane'::text) |
Rows Removed by Filter: 218817 |
-> Index Scan using actor_pkey on actors (cost=0.42..0.46 rows=1 width=18) (actual time=0.058..0.058 rows=1 loops=4) |
Index Cond: (id = cast_info.person_id) |
Planning Time: 0.270 ms |
Execution Time: 329.274 ms |
printplans(__)
QUERY PLAN 0 Gather (cost=9279.46..30704.82 rows=7 width=5... 1 Workers Planned: 2 2 Workers Launched: 2 3 -> Nested Loop (cost=8279.46..29704.12 row... 4 -> Parallel Hash Join (cost=8279.04.... 5 Hash Cond: (cast_info.movie_id =... 6 -> Parallel Seq Scan on cast_in... 7 -> Parallel Hash (cost=8279.03... 8 Buckets: 1024 Batches: 1 ... 9 -> Parallel Seq Scan on m... 10 Filter: (title ~~ 'S... 11 Rows Removed by Filt... 12 -> Index Scan using actor_pkey on act... 13 Index Cond: (id = cast_info.pers... 14 Planning Time: 0.270 ms 15 Execution Time: 329.274 ms
Three-way joins with Indexes¶
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=0.86..9.46 rows=10 width=56) (actual time=0.026..0.075 rows=10 loops=1) |
-> Nested Loop (cost=0.86..1901329.72 rows=2211936 width=56) (actual time=0.025..0.073 rows=10 loops=1) |
-> Nested Loop (cost=0.43..895896.13 rows=2211936 width=26) (actual time=0.020..0.053 rows=10 loops=1) |
-> Seq Scan on cast_info (cost=0.00..31907.36 rows=2211936 width=8) (actual time=0.007..0.008 rows=10 loops=1) |
-> Memoize (cost=0.43..0.47 rows=1 width=18) (actual time=0.004..0.004 rows=1 loops=10) |
Cache Key: cast_info.person_id |
Cache Mode: logical |
Hits: 2 Misses: 8 Evictions: 0 Overflows: 0 Memory Usage: 1kB |
-> Index Scan using actor_pkey on actors (cost=0.42..0.46 rows=1 width=18) (actual time=0.004..0.004 rows=1 loops=8) |
Index Cond: (id = cast_info.person_id) |
-> Index Scan using movie_pkey on movies (cost=0.42..0.45 rows=1 width=30) (actual time=0.002..0.002 rows=1 loops=10) |
Index Cond: (id = cast_info.movie_id) |
Planning Time: 0.272 ms |
Execution Time: 0.333 ms |
printplans(__)
QUERY PLAN 0 Limit (cost=0.86..9.46 rows=10 width=56) (act... 1 -> Nested Loop (cost=0.86..1901329.72 rows... 2 -> Nested Loop (cost=0.43..895896.13... 3 -> Seq Scan on cast_info (cost... 4 -> Memoize (cost=0.43..0.47 ro... 5 Cache Key: cast_info.perso... 6 Cache Mode: logical 7 Hits: 2 Misses: 8 Evicti... 8 -> Index Scan using actor... 9 Index Cond: (id = ca... 10 -> Index Scan using movie_pkey on mov... 11 Index Cond: (id = cast_info.movi... 12 Planning Time: 0.272 ms 13 Execution Time: 0.333 ms
What if we dropped one of the indexes?
To do so we must drop the primary key constraint on actors.id:
%sql ALTER TABLE actors DROP CONSTRAINT actor_pkey CASCADE;
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=16222.62..16228.33 rows=10 width=56) (actual time=594.208..846.259 rows=10 loops=1) |
-> Nested Loop (cost=16222.62..1278418.10 rows=2211936 width=56) (actual time=594.206..846.256 rows=10 loops=1) |
-> Gather (cost=16222.20..272984.51 rows=2211936 width=26) (actual time=594.154..846.183 rows=10 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Hash Join (cost=15222.20..50790.91 rows=921640 width=26) (actual time=564.550..564.972 rows=810 loops=3) |
Hash Cond: (cast_info.person_id = actors.id) |
-> Parallel Seq Scan on cast_info (cost=0.00..19004.40 rows=921640 width=8) (actual time=0.032..203.254 rows=737312 loops=3) |
-> Parallel Hash (cost=8750.53..8750.53 rows=352453 width=18) (actual time=111.486..111.487 rows=281963 loops=3) |
Buckets: 65536 Batches: 16 Memory Usage: 0kB |
-> Parallel Seq Scan on actors (cost=0.00..8750.53 rows=352453 width=18) (actual time=0.024..19.288 rows=281963 loops=3) |
-> Index Scan using movie_pkey on movies (cost=0.42..0.45 rows=1 width=30) (actual time=0.006..0.006 rows=1 loops=10) |
Index Cond: (id = cast_info.movie_id) |
Planning Time: 0.286 ms |
Execution Time: 846.299 ms |
printplans(__)
QUERY PLAN 0 Limit (cost=16222.62..16228.33 rows=10 width=... 1 -> Nested Loop (cost=16222.62..1278418.10 ... 2 -> Gather (cost=16222.20..272984.51 ... 3 Workers Planned: 2 4 Workers Launched: 2 5 -> Parallel Hash Join (cost=15... 6 Hash Cond: (cast_info.pers... 7 -> Parallel Seq Scan on c... 8 -> Parallel Hash (cost=8... 9 Buckets: 65536 Batc... 10 -> Parallel Seq Sca... 11 -> Index Scan using movie_pkey on mov... 12 Index Cond: (id = cast_info.movi... 13 Planning Time: 0.286 ms 14 Execution Time: 846.299 ms
What if we dropped both indexes?
%sql ALTER TABLE movies DROP CONSTRAINT movie_pkey CASCADE;
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=29106.44..29107.70 rows=10 width=56) (actual time=2159.897..2283.595 rows=10 loops=1) |
-> Gather (cost=29106.44..307637.07 rows=2211936 width=56) (actual time=2159.896..2283.591 rows=10 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Hash Join (cost=28106.44..85443.47 rows=921640 width=56) (actual time=2155.183..2155.601 rows=4 loops=3) |
Hash Cond: (cast_info.movie_id = movies.id) |
-> Parallel Hash Join (cost=15222.20..50790.91 rows=921640 width=26) (actual time=692.958..1507.575 rows=737312 loops=3) |
Hash Cond: (cast_info.person_id = actors.id) |
-> Parallel Seq Scan on cast_info (cost=0.00..19004.40 rows=921640 width=8) (actual time=0.033..169.595 rows=737312 loops=3) |
-> Parallel Hash (cost=8750.53..8750.53 rows=352453 width=18) (actual time=195.728..195.729 rows=281963 loops=3) |
Buckets: 65536 Batches: 16 Memory Usage: 3360kB |
-> Parallel Seq Scan on actors (cost=0.00..8750.53 rows=352453 width=18) (actual time=0.026..41.592 rows=281963 loops=3) |
-> Parallel Hash (cost=7595.22..7595.22 rows=273522 width=30) (actual time=71.091..71.092 rows=218818 loops=3) |
Buckets: 65536 Batches: 16 Memory Usage: 0kB |
-> Parallel Seq Scan on movies (cost=0.00..7595.22 rows=273522 width=30) (actual time=0.019..16.267 rows=218818 loops=3) |
Planning Time: 0.209 ms |
Execution Time: 2283.632 ms |
printplans(__)
QUERY PLAN 0 Limit (cost=29106.44..29107.70 rows=10 width=... 1 -> Gather (cost=29106.44..307637.07 rows=2... 2 Workers Planned: 2 3 Workers Launched: 2 4 -> Parallel Hash Join (cost=28106.44... 5 Hash Cond: (cast_info.movie_id =... 6 -> Parallel Hash Join (cost=15... 7 Hash Cond: (cast_info.pers... 8 -> Parallel Seq Scan on c... 9 -> Parallel Hash (cost=8... 10 Buckets: 65536 Batc... 11 -> Parallel Seq Sca... 12 -> Parallel Hash (cost=7595.22... 13 Buckets: 65536 Batches: 1... 14 -> Parallel Seq Scan on m... 15 Planning Time: 0.209 ms 16 Execution Time: 2283.632 ms
Cleanup¶
We close the connection, then drop the database:
%sql --close postgresql://127.0.0.1:5432/imdb_perf_lecture
!psql -h localhost -c 'DROP DATABASE IF EXISTS imdb_perf_lecture'
ERROR: database "imdb_perf_lecture" is being accessed by other users DETAIL: There is 1 other session using the database.