Lecture 08¶
# Run this cell to set up imports
import numpy as np
import pandas as pd
Load in the IMDB Performance database¶
This is a variation of the IMDB database with keys defined. Note that this is a pretty big database! So if you run the below lines, please also remember to delete the imdb_perf_lecture
afterwards to save space on your limited postgreSQL server.
We assume you have the associated lecture folder lec06
pulled into your repo already. The below commands create a symbolic link (i.e., shortcut/redirect with ln
) to this lecture data directory, allowing some space saving, and unzip the database file.
!ln -sf ../../lec/lec06/data .
!unzip -u data/imdb_perf_lecture.zip -d data/
Archive: data/imdb_perf_lecture.zip
!psql -h localhost -c 'DROP DATABASE IF EXISTS imdb_perf_lecture'
!psql -h localhost -c 'CREATE DATABASE imdb_perf_lecture'
!psql -h localhost -d imdb_perf_lecture -f data/imdb_perf_lecture.sql
DROP DATABASE CREATE DATABASE SET SET SET SET SET set_config ------------ (1 row) SET SET SET SET SET SET CREATE TABLE ALTER TABLE CREATE TABLE ALTER TABLE CREATE TABLE ALTER TABLE COPY 845888 COPY 2211936 COPY 656453 ALTER TABLE ALTER TABLE ALTER TABLE ALTER TABLE
Start jupysql
¶
%reload_ext sql
%sql postgresql://127.0.0.1:5432/imdb_perf_lecture
If you're having trouble seeing the entirety of query plans, you can run the following cell to set the limit on displayed rows to 20. Careful: Do not set this to None
and run the actual queries; SQL will return millions of rows and crash your kernel!
# run this cell to remove 10-row limit on display
%config SqlMagic.displaylimit = 20
Matching¶
jupysql
dataframe visualization removes any whitespace.
You can also run (after each cell):
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
%%sql
/* 1 */
EXPLAIN ANALYZE
SELECT id FROM actors
WHERE id > 4000000 AND
name='Tom Hanks';
QUERY PLAN |
---|
Gather (cost=1000.00..11653.80 rows=1410 width=4) (actual time=27.728..30.058 rows=0 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Seq Scan on actors (cost=0.00..10512.80 rows=588 width=4) (actual time=24.968..24.969 rows=0 loops=3) |
Filter: ((id > 4000000) AND (name = 'Tom Hanks'::text)) |
Rows Removed by Filter: 281963 |
Planning Time: 0.147 ms |
Execution Time: 30.076 ms |
%%sql
/* 2 */
EXPLAIN ANALYZE
SELECT id FROM actors
ORDER BY name
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=17366.94..17368.11 rows=10 width=36) (actual time=140.249..142.642 rows=10 loops=1) |
-> Gather Merge (cost=17366.94..99611.72 rows=704906 width=36) (actual time=140.248..142.637 rows=10 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Sort (cost=16366.92..17248.05 rows=352453 width=36) (actual time=137.363..137.365 rows=8 loops=3) |
Sort Key: name |
Sort Method: top-N heapsort Memory: 26kB |
Worker 0: Sort Method: top-N heapsort Memory: 26kB |
Worker 1: Sort Method: top-N heapsort Memory: 26kB |
-> Parallel Seq Scan on actors (cost=0.00..8750.53 rows=352453 width=36) (actual time=0.021..49.143 rows=281963 loops=3) |
Planning Time: 0.108 ms |
Execution Time: 142.692 ms |
%%sql
/* 3 */
EXPLAIN ANALYZE
SELECT id FROM actors
ORDER BY id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=0.42..0.93 rows=10 width=4) (actual time=0.089..0.093 rows=10 loops=1) |
-> Index Only Scan using actor_pkey on actors (cost=0.42..42884.75 rows=845888 width=4) (actual time=0.088..0.091 rows=10 loops=1) |
Heap Fetches: 10 |
Planning Time: 0.081 ms |
Execution Time: 0.106 ms |
Two-table demo: LIMIT¶
Let's join two tables, actors
and cast_info
. The query planner selects a hash join:
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info
WHERE actors.id = cast_info.person_id;
QUERY PLAN |
---|
Hash Join (cost=30867.48..92474.12 rows=2212088 width=44) (actual time=190.552..1385.646 rows=2211936 loops=1) |
Hash Cond: (cast_info.person_id = actors.id) |
-> Seq Scan on cast_info (cost=0.00..31908.88 rows=2212088 width=8) (actual time=0.021..160.925 rows=2211936 loops=1) |
-> Hash (cost=13684.88..13684.88 rows=845888 width=36) (actual time=190.117..190.118 rows=845888 loops=1) |
Buckets: 65536 Batches: 32 Memory Usage: 1810kB |
-> Seq Scan on actors (cost=0.00..13684.88 rows=845888 width=36) (actual time=0.006..68.439 rows=845888 loops=1) |
Planning Time: 0.221 ms |
Execution Time: 1449.252 ms |
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
QUERY PLAN 0 Hash Join (cost=30867.48..92474.12 rows=22120... 1 Hash Cond: (cast_info.person_id = actors.id) 2 -> Seq Scan on cast_info (cost=0.00..31908... 3 -> Hash (cost=13684.88..13684.88 rows=8458... 4 Buckets: 65536 Batches: 32 Memory Us... 5 -> Seq Scan on actors (cost=0.00..13... 6 Planning Time: 0.221 ms 7 Execution Time: 1449.252 ms
Below, we add LIMIT
. Note the query planner switches to a nested loop join, using an index scan to match cast_info.person_id
to the indexed attribute actors.id
! This results in a 10,000x speedup!
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info
WHERE actors.id = cast_info.person_id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=0.42..5.13 rows=10 width=44) (actual time=0.052..0.165 rows=10 loops=1) |
-> Nested Loop (cost=0.42..1040953.73 rows=2212088 width=44) (actual time=0.051..0.163 rows=10 loops=1) |
-> Seq Scan on cast_info (cost=0.00..31908.88 rows=2212088 width=8) (actual time=0.025..0.026 rows=10 loops=1) |
-> Index Scan using actor_pkey on actors (cost=0.42..0.46 rows=1 width=36) (actual time=0.013..0.013 rows=1 loops=10) |
Index Cond: (id = cast_info.person_id) |
Planning Time: 0.112 ms |
Execution Time: 0.185 ms |
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
QUERY PLAN 0 Limit (cost=0.42..5.13 rows=10 width=44) (act... 1 -> Nested Loop (cost=0.42..1040953.73 rows... 2 -> Seq Scan on cast_info (cost=0.00.... 3 -> Index Scan using actor_pkey on act... 4 Index Cond: (id = cast_info.pers... 5 Planning Time: 0.112 ms 6 Execution Time: 0.185 ms
Two-table demo: Projection¶
jupysql
dataframe visualization removes any whitespace.
%%sql
EXPLAIN ANALYZE
SELECT name, movie_id
FROM actors, cast_info
WHERE actors.id = cast_info.person_id;
QUERY PLAN |
---|
Hash Join (cost=30867.48..92474.12 rows=2212088 width=36) (actual time=258.110..1695.561 rows=2211936 loops=1) |
Hash Cond: (cast_info.person_id = actors.id) |
-> Seq Scan on cast_info (cost=0.00..31908.88 rows=2212088 width=8) (actual time=0.012..293.502 rows=2211936 loops=1) |
-> Hash (cost=13684.88..13684.88 rows=845888 width=36) (actual time=257.724..257.726 rows=845888 loops=1) |
Buckets: 65536 Batches: 32 Memory Usage: 1848kB |
-> Seq Scan on actors (cost=0.00..13684.88 rows=845888 width=36) (actual time=0.007..102.280 rows=845888 loops=1) |
Planning Time: 0.092 ms |
Execution Time: 1761.153 ms |
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
QUERY PLAN 0 Hash Join (cost=30867.48..92474.12 rows=22120... 1 Hash Cond: (cast_info.person_id = actors.id) 2 -> Seq Scan on cast_info (cost=0.00..31908... 3 -> Hash (cost=13684.88..13684.88 rows=8458... 4 Buckets: 65536 Batches: 32 Memory Us... 5 -> Seq Scan on actors (cost=0.00..13... 6 Planning Time: 0.092 ms 7 Execution Time: 1761.153 ms
The below is not as substantial a reduction, but still about a quarter speed-up.
- Notice that projection was pushed down below the join “at source”.
- If we waited until join was done, would be at least as expensive.
%%sql
EXPLAIN ANALYZE
SELECT name, movie_id
FROM actors, cast_info
WHERE actors.id = cast_info.person_id AND actors.id > 4000000;
QUERY PLAN |
---|
Hash Join (cost=19763.71..76964.36 rows=737364 width=36) (actual time=169.979..1279.023 rows=634763 loops=1) |
Hash Cond: (cast_info.person_id = actors.id) |
-> Seq Scan on cast_info (cost=0.00..31908.88 rows=2212088 width=8) (actual time=0.009..253.033 rows=2211936 loops=1) |
-> Hash (cost=14036.18..14036.18 rows=281963 width=36) (actual time=169.158..169.164 rows=444781 loops=1) |
Buckets: 65536 Batches: 8 Memory Usage: 3335kB |
-> Bitmap Heap Scan on actors (cost=5285.64..14036.18 rows=281963 width=36) (actual time=25.041..87.701 rows=444781 loops=1) |
Recheck Cond: (id > 4000000) |
Heap Blocks: exact=3088 |
-> Bitmap Index Scan on actor_pkey (cost=0.00..5215.15 rows=281963 width=0) (actual time=24.660..24.660 rows=444781 loops=1) |
Index Cond: (id > 4000000) |
Planning Time: 0.123 ms |
Execution Time: 1330.343 ms |
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
QUERY PLAN 0 Hash Join (cost=19763.71..76964.36 rows=73736... 1 Hash Cond: (cast_info.person_id = actors.id) 2 -> Seq Scan on cast_info (cost=0.00..31908... 3 -> Hash (cost=14036.18..14036.18 rows=2819... 4 Buckets: 65536 Batches: 8 Memory Usa... 5 -> Bitmap Heap Scan on actors (cost=... 6 Recheck Cond: (id > 4000000) 7 Heap Blocks: exact=3088 8 -> Bitmap Index Scan on actor_p... 9 Index Cond: (id > 4000000) 10 Planning Time: 0.123 ms 11 Execution Time: 1330.343 ms
Three-way joins¶
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=0.85..10.10 rows=10 width=74) (actual time=0.066..0.116 rows=10 loops=1) |
-> Nested Loop (cost=0.85..2046318.54 rows=2211936 width=74) (actual time=0.065..0.114 rows=10 loops=1) |
-> Nested Loop (cost=0.42..1040884.95 rows=2211936 width=44) (actual time=0.017..0.052 rows=10 loops=1) |
-> Seq Scan on cast_info (cost=0.00..31907.36 rows=2211936 width=8) (actual time=0.007..0.008 rows=10 loops=1) |
-> Index Scan using actor_pkey on actors (cost=0.42..0.46 rows=1 width=36) (actual time=0.004..0.004 rows=1 loops=10) |
Index Cond: (id = cast_info.person_id) |
-> Index Scan using movie_pkey on movies (cost=0.42..0.45 rows=1 width=30) (actual time=0.006..0.006 rows=1 loops=10) |
Index Cond: (id = cast_info.movie_id) |
Planning Time: 0.336 ms |
Execution Time: 0.138 ms |
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
QUERY PLAN 0 Limit (cost=0.85..10.10 rows=10 width=74) (ac... 1 -> Nested Loop (cost=0.85..2046318.54 rows... 2 -> Nested Loop (cost=0.42..1040884.9... 3 -> Seq Scan on cast_info (cost... 4 -> Index Scan using actor_pkey ... 5 Index Cond: (id = cast_inf... 6 -> Index Scan using movie_pkey on mov... 7 Index Cond: (id = cast_info.movi... 8 Planning Time: 0.336 ms 9 Execution Time: 0.138 ms
Below, note the predicate pushdown in the sequential scan on actors! Again, copy-paste into psql
if you can't see the whitespace formatting.
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
AND name = 'Tom Hanks';
QUERY PLAN |
---|
Gather (cost=10654.12..35277.87 rows=11059 width=74) (actual time=199.597..394.324 rows=65 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Nested Loop (cost=9654.12..33171.97 rows=4608 width=74) (actual time=222.089..387.662 rows=22 loops=3) |
-> Parallel Hash Join (cost=9653.69..31077.41 rows=4608 width=44) (actual time=221.993..387.051 rows=22 loops=3) |
Hash Cond: (cast_info.person_id = actors.id) |
-> Parallel Seq Scan on cast_info (cost=0.00..19004.40 rows=921640 width=8) (actual time=0.041..141.876 rows=737312 loops=3) |
-> Parallel Hash (cost=9631.67..9631.67 rows=1762 width=36) (actual time=56.761..56.762 rows=0 loops=3) |
Buckets: 8192 Batches: 1 Memory Usage: 96kB |
-> Parallel Seq Scan on actors (cost=0.00..9631.67 rows=1762 width=36) (actual time=16.331..25.688 rows=0 loops=3) |
Filter: (name = 'Tom Hanks'::text) |
Rows Removed by Filter: 281962 |
-> Index Scan using movie_pkey on movies (cost=0.42..0.45 rows=1 width=30) (actual time=0.027..0.027 rows=1 loops=65) |
Index Cond: (id = cast_info.movie_id) |
Planning Time: 0.155 ms |
Execution Time: 394.368 ms |
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
QUERY PLAN 0 Gather (cost=10654.12..35277.87 rows=11059 wi... 1 Workers Planned: 2 2 Workers Launched: 2 3 -> Nested Loop (cost=9654.12..33171.97 row... 4 -> Parallel Hash Join (cost=9653.69.... 5 Hash Cond: (cast_info.person_id ... 6 -> Parallel Seq Scan on cast_in... 7 -> Parallel Hash (cost=9631.67... 8 Buckets: 8192 Batches: 1 ... 9 -> Parallel Seq Scan on a... 10 Filter: (name = 'Tom... 11 Rows Removed by Filt... 12 -> Index Scan using movie_pkey on mov... 13 Index Cond: (id = cast_info.movi... 14 Planning Time: 0.155 ms 15 Execution Time: 394.368 ms
Compare with the below predicate pushdown, where the filter is now on movie titles:
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
AND title LIKE 'Snakes on a Plane';
QUERY PLAN |
---|
Gather (cost=9279.46..30704.82 rows=7 width=74) (actual time=402.117..405.350 rows=4 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Nested Loop (cost=8279.46..29704.12 rows=3 width=74) (actual time=273.025..399.542 rows=1 loops=3) |
-> Parallel Hash Join (cost=8279.04..29702.75 rows=3 width=38) (actual time=272.947..399.439 rows=1 loops=3) |
Hash Cond: (cast_info.movie_id = movies.id) |
-> Parallel Seq Scan on cast_info (cost=0.00..19004.40 rows=921640 width=8) (actual time=0.030..212.064 rows=737312 loops=3) |
-> Parallel Hash (cost=8279.03..8279.03 rows=1 width=30) (actual time=22.482..22.483 rows=0 loops=3) |
Buckets: 1024 Batches: 1 Memory Usage: 40kB |
-> Parallel Seq Scan on movies (cost=0.00..8279.03 rows=1 width=30) (actual time=17.106..22.422 rows=0 loops=3) |
Filter: (title ~~ 'Snakes on a Plane'::text) |
Rows Removed by Filter: 218817 |
-> Index Scan using actor_pkey on actors (cost=0.42..0.46 rows=1 width=36) (actual time=0.070..0.070 rows=1 loops=4) |
Index Cond: (id = cast_info.person_id) |
Planning Time: 0.278 ms |
Execution Time: 405.404 ms |
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
QUERY PLAN 0 Gather (cost=9279.46..30704.82 rows=7 width=7... 1 Workers Planned: 2 2 Workers Launched: 2 3 -> Nested Loop (cost=8279.46..29704.12 row... 4 -> Parallel Hash Join (cost=8279.04.... 5 Hash Cond: (cast_info.movie_id =... 6 -> Parallel Seq Scan on cast_in... 7 -> Parallel Hash (cost=8279.03... 8 Buckets: 1024 Batches: 1 ... 9 -> Parallel Seq Scan on m... 10 Filter: (title ~~ 'S... 11 Rows Removed by Filt... 12 -> Index Scan using actor_pkey on act... 13 Index Cond: (id = cast_info.pers... 14 Planning Time: 0.278 ms 15 Execution Time: 405.404 ms
Three-way joins with Indexes¶
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=0.86..9.60 rows=10 width=74) (actual time=0.042..0.119 rows=10 loops=1) |
-> Nested Loop (cost=0.86..1932285.82 rows=2211936 width=74) (actual time=0.041..0.117 rows=10 loops=1) |
-> Nested Loop (cost=0.43..926852.23 rows=2211936 width=44) (actual time=0.030..0.082 rows=10 loops=1) |
-> Seq Scan on cast_info (cost=0.00..31907.36 rows=2211936 width=8) (actual time=0.011..0.012 rows=10 loops=1) |
-> Memoize (cost=0.43..0.47 rows=1 width=36) (actual time=0.006..0.006 rows=1 loops=10) |
Cache Key: cast_info.person_id |
Cache Mode: logical |
Hits: 2 Misses: 8 Evictions: 0 Overflows: 0 Memory Usage: 1kB |
-> Index Scan using actor_pkey on actors (cost=0.42..0.46 rows=1 width=36) (actual time=0.006..0.006 rows=1 loops=8) |
Index Cond: (id = cast_info.person_id) |
-> Index Scan using movie_pkey on movies (cost=0.42..0.45 rows=1 width=30) (actual time=0.003..0.003 rows=1 loops=10) |
Index Cond: (id = cast_info.movie_id) |
Planning Time: 0.232 ms |
Execution Time: 1.233 ms |
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
QUERY PLAN 0 Limit (cost=0.86..9.60 rows=10 width=74) (act... 1 -> Nested Loop (cost=0.86..1932285.82 rows... 2 -> Nested Loop (cost=0.43..926852.23... 3 -> Seq Scan on cast_info (cost... 4 -> Memoize (cost=0.43..0.47 ro... 5 Cache Key: cast_info.perso... 6 Cache Mode: logical 7 Hits: 2 Misses: 8 Evicti... 8 -> Index Scan using actor... 9 Index Cond: (id = ca... 10 -> Index Scan using movie_pkey on mov... 11 Index Cond: (id = cast_info.movi... 12 Planning Time: 0.232 ms 13 Execution Time: 1.233 ms
What if we dropped one of the indexes?
To do so we must drop the primary key constraint on actors.id:
%sql ALTER TABLE actors DROP CONSTRAINT actor_pkey CASCADE;
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=0.42..25271.13 rows=10 width=74) (actual time=295.628..296.522 rows=10 loops=1) |
-> Nested Loop (cost=0.42..35381485254.95 rows=14000989 width=74) (actual time=295.627..296.519 rows=10 loops=1) |
-> Nested Loop (cost=0.00..35375121117.60 rows=14000989 width=44) (actual time=295.591..296.409 rows=10 loops=1) |
Join Filter: (actors.id = cast_info.person_id) |
Rows Removed by Join Filter: 1118182 |
-> Seq Scan on actors (cost=0.00..13684.88 rows=845888 width=36) (actual time=0.009..0.010 rows=1 loops=1) |
-> Materialize (cost=0.00..51608.04 rows=2211936 width=8) (actual time=0.007..230.778 rows=1118192 loops=1) |
-> Seq Scan on cast_info (cost=0.00..31907.36 rows=2211936 width=8) (actual time=0.005..75.815 rows=1118192 loops=1) |
-> Index Scan using movie_pkey on movies (cost=0.42..0.45 rows=1 width=30) (actual time=0.010..0.010 rows=1 loops=10) |
Index Cond: (id = cast_info.movie_id) |
Planning Time: 0.401 ms |
Execution Time: 300.628 ms |
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
QUERY PLAN 0 Limit (cost=0.42..25271.13 rows=10 width=74) ... 1 -> Nested Loop (cost=0.42..35381485254.95 ... 2 -> Nested Loop (cost=0.00..353751211... 3 Join Filter: (actors.id = cast_i... 4 Rows Removed by Join Filter: 111... 5 -> Seq Scan on actors (cost=0.... 6 -> Materialize (cost=0.00..516... 7 -> Seq Scan on cast_info ... 8 -> Index Scan using movie_pkey on mov... 9 Index Cond: (id = cast_info.movi... 10 Planning Time: 0.401 ms 11 Execution Time: 300.628 ms
What if we dropped both indexes?
%sql ALTER TABLE movies DROP CONSTRAINT movie_pkey CASCADE;
%%sql
EXPLAIN ANALYZE
SELECT *
FROM actors, cast_info, movies
WHERE actors.id = cast_info.person_id
AND movies.id = cast_info.movie_id
LIMIT 10;
QUERY PLAN |
---|
Limit (cost=13884.25..44371.17 rows=10 width=74) (actual time=628.635..1815.969 rows=10 loops=1) |
-> Nested Loop (cost=13884.25..42684723061.69 rows=14000989 width=74) (actual time=628.634..1815.952 rows=10 loops=1) |
Join Filter: (cast_info.person_id = actors.id) |
Rows Removed by Join Filter: 7743513 |
-> Gather (cost=13884.25..270450.57 rows=2211936 width=38) (actual time=608.599..639.809 rows=10 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Hash Join (cost=12884.25..48256.97 rows=921640 width=38) (actual time=603.671..604.109 rows=673 loops=3) |
Hash Cond: (cast_info.movie_id = movies.id) |
-> Parallel Seq Scan on cast_info (cost=0.00..19004.40 rows=921640 width=8) (actual time=0.022..189.338 rows=737312 loops=3) |
-> Parallel Hash (cost=7595.22..7595.22 rows=273522 width=30) (actual time=115.192..115.193 rows=218818 loops=3) |
Buckets: 65536 Batches: 16 Memory Usage: 0kB |
-> Parallel Seq Scan on movies (cost=0.00..7595.22 rows=273522 width=30) (actual time=0.019..17.209 rows=218818 loops=3) |
-> Materialize (cost=0.00..24523.32 rows=845888 width=36) (actual time=0.004..73.982 rows=774352 loops=10) |
-> Seq Scan on actors (cost=0.00..13684.88 rows=845888 width=36) (actual time=0.015..53.748 rows=845888 loops=1) |
Planning Time: 0.187 ms |
Execution Time: 1822.068 ms |
result = _.DataFrame()
result.style.set_properties(**{'text-align': 'left'})
print(result)
QUERY PLAN 0 Limit (cost=13884.25..44371.17 rows=10 width=... 1 -> Nested Loop (cost=13884.25..42684723061... 2 Join Filter: (cast_info.person_id = ac... 3 Rows Removed by Join Filter: 7743513 4 -> Gather (cost=13884.25..270450.57 ... 5 Workers Planned: 2 6 Workers Launched: 2 7 -> Parallel Hash Join (cost=12... 8 Hash Cond: (cast_info.movi... 9 -> Parallel Seq Scan on c... 10 -> Parallel Hash (cost=7... 11 Buckets: 65536 Batc... 12 -> Parallel Seq Sca... 13 -> Materialize (cost=0.00..24523.32 ... 14 -> Seq Scan on actors (cost=0.... 15 Planning Time: 0.187 ms 16 Execution Time: 1822.068 ms
Cleanup¶
We close the connection, then drop the database:
%sql --close postgresql://127.0.0.1:5432/imdb_perf_lecture
!psql -h localhost -c 'DROP DATABASE IF EXISTS imdb_perf_lecture'
DROP DATABASE