# Run this cell to set up imports
import numpy as np
import pandas as pd
Today's database is a modified version of the Oakland police stops database as made public by the Stanford Open Policing Project: https://openpolicing.stanford.edu/data/ Read more about the database here.
If you are looking for how we adjusted the data, contact us and we can share our source notebook.
!psql -h localhost -c 'DROP DATABASE IF EXISTS stops_lecture'
!psql -h localhost -c 'CREATE DATABASE stops_lecture'
!psql -h localhost -d stops_lecture -f data/stops_lecture.sql
DROP DATABASE CREATE DATABASE SET SET SET SET SET set_config ------------ (1 row) SET SET SET SET SET SET CREATE TABLE ALTER TABLE CREATE TABLE ALTER TABLE COPY 84779 COPY 211 CREATE INDEX CREATE INDEX
jupysql
¶We are going to be using the jupysql
library to connect our notebook to a PostgreSQL database server on your jupyterhub account. The next cell should do the trick; you should not see any error messages after it completes.
%load_ext sql
%sql postgresql://127.0.0.1:5432/stops_lecture
%sql SELECT * FROM Zips LIMIT 10;
zip_id | location | zipcode |
---|---|---|
0 | None | 94546 |
1 | Chabot Park | 94546 |
2 | Panoramic Hill | 94563 |
3 | None | 94577 |
4 | North Stonehurst | 94577 |
5 | Arrowhead Marsh | 94601 |
6 | Bartlett | 94601 |
7 | Coliseum Industrial | 94601 |
8 | Fairfax | 94601 |
9 | Fairfax Business-Wentworth-Holland | 94601 |
With cell magic (%%sql
), we can also assign the result to a variable. Try adding result <<
to the top line of the below cell, i.e., replace the first line with
%%sql result <<
%%sql
SELECT *
FROM Stops
ORDER BY RANDOM()
LIMIT 10;
result
%%sql
SELECT *
FROM Stops
WHERE id = 123;
%%sql
SELECT S1.id, S1.race,
S1.location
FROM Stops as S1,
Stops as S2
WHERE S1.location = S2.location
AND S2.id = 123;
Approach #2 with CTE:
%%sql
WITH Location123 AS (
SELECT location
FROM Stops
WHERE id = 123
)
SELECT S.id, S.race,
S.location
FROM Stops as S,
Location123
WHERE S.location = Location123.location;
Approach #3 with scalar subquery:
%%sql
SELECT S1.id, S1.race, S1.location
FROM Stops S1
WHERE S1.location = (
SELECT S2.location
FROM Stops
S2 WHERE S2.id = 123);
EXISTS
Subquery (and NOT EXISTS
)¶%%sql
SELECT DISTINCT Stops.location FROM Stops
WHERE NOT EXISTS (
SELECT * FROM Zips
WHERE Zips.location =
Stops.location);
%%sql
WITH StopZips AS (
SELECT * FROM Stops NATURAL JOIN Zips
)
SELECT *
FROM StopZips SZ1
WHERE NOT EXISTS (
...
);
Note -- there are no stops that re the only one in their zipcode:
%%sql
SELECT COUNT(*) as count
FROM Stops S1
NATURAL JOIN Zips Z1
GROUP BY Z1.zipcode
ORDER BY count ASC;
%%sql
SELECT S1.id
FROM Stops S1
WHERE S1.age >=
(...);
%%sql
SELECT race,
AVG (...) AS west_oakland_avg,
AVG (...) AS rockridge_avg
FROM ...
...;