Lecture 02: SQL Review¶
First load in the data into the database¶
!unzip -u data/imdb_lecture.zip -d data/
!psql -h localhost -c 'DROP DATABASE IF EXISTS imdb_lecture'
!psql -h localhost -c 'CREATE DATABASE imdb_lecture'
!psql -h localhost -d imdb_lecture -f data/imdb_lecture.sql
Using psql
in Terminal¶
psql
is a command-line PostgreSQL interactive client.
I find it useful to keep the Terminal up while I'm working on notebooks for the following:
- meta-commands:
psql
commands to query information (generally metadata) about the databse - writing interactive SQL queries:
psql
shows me a few rows at a time, and I can quit whenever. Avoids Jupyter notebooks running out of space if the query result relation is huge.
To launch psql
and connect to a specific database, say, the imdb_lecture
database we just created on localhost
, open up a Terminal and type in:
psql postgresql://127.0.0.1:5432/imdb_lecture
Note the postgres server is on localhost (i.e., IP address 127.0.0.0.1
) and network port 5432
.
Troubleshooting:
- You do not have to be in a particular directory to launch the
psql
client! - If you cannot connect or you do not see any relations with
\d
, make sure you have created/loaded in the database with the!psql
commands in the previous section. - If your interactive query is not executing, check to see if you have ended with a semicolon (necessary and also generally good style!).
Quick reference:
\l
list all databases available on this server\d
list all relations in this database\d tablename
list schema of tablename relation\q
quit psql\?
help<ctrl>-c
cancel<ctrl>-a
,<ctrl>-e
jump to the front and back of a line, respectively<ctrl>-<left>
,<ctrl>-<right
> jump one word previous and forward, respectively- (when in query result buffer)
<space>
to advance a page,q
to quit and exit out
Using jupysql
in Jupyter Notebook¶
We are going to be using the jupysql
library to connect our notebook to a PostgreSQL database server on your jupyterhub account. The next cell should do the trick; you should not see any error messages after it completes.
%reload_ext sql
Note we did not do import jupysql
(this will throw an error). You should always load jupysql
as the sql
cell magic, as shown above.
jupysql
helps us create a client connection directly from our Notebook. However, just like before, we first need to connect to our database before we start issuing any queries:
%sql postgresql://jovyan@127.0.0.1:5432/imdb_lecture
Example test query:
%%sql
SELECT *
FROM people
WHERE name = 'Michelle Yeoh';
See the slides for most of the queries, which we executed in the interactive psql
client.