Lecture 02: SQL Review¶
First load in the data into the database¶
!unzip -u data/imdb_lecture.zip -d data/
!psql -h localhost -c 'DROP DATABASE IF EXISTS imdb_lecture'
!psql -h localhost -c 'CREATE DATABASE imdb_lecture'
!psql -h localhost -d imdb_lecture -f data/imdb_lecture.sql
Using psql in Terminal¶
psql is a command-line PostgreSQL interactive client.
I find it useful to keep the Terminal up while I'm working on notebooks for the following:
- meta-commands:
psqlcommands to query information (generally metadata) about the databse - writing interactive SQL queries:
psqlshows me a few rows at a time, and I can quit whenever. Avoids Jupyter notebooks running out of space if the query result relation is huge.
To launch psql and connect to a specific database, say, the imdb_lecture database we just created on localhost, open up a Terminal and type in:
psql postgresql://127.0.0.1:5432/imdb_lecture
Note the postgres server is on localhost (i.e., IP address 127.0.0.0.1) and network port 5432.
Troubleshooting:
- You do not have to be in a particular directory to launch the
psqlclient! - If you cannot connect or you do not see any relations with
\d, make sure you have created/loaded in the database with the!psqlcommands in the previous section. - If your interactive query is not executing, check to see if you have ended with a semicolon (necessary and also generally good style!).
Quick reference:
\llist all databases available on this server\dlist all relations in this database\d tablenamelist schema of tablename relation\qquit psql\?help<ctrl>-ccancel<ctrl>-a,<ctrl>-ejump to the front and back of a line, respectively<ctrl>-<left>,<ctrl>-<right> jump one word previous and forward, respectively- (when in query result buffer)
<space>to advance a page,qto quit and exit out
Using jupysql in Jupyter Notebook¶
We are going to be using the jupysql library to connect our notebook to a PostgreSQL database server on your jupyterhub account. The next cell should do the trick; you should not see any error messages after it completes.
%reload_ext sql
Note we did not do import jupysql (this will throw an error). You should always load jupysql as the sql cell magic, as shown above.
jupysql helps us create a client connection directly from our Notebook. However, just like before, we first need to connect to our database before we start issuing any queries:
%sql postgresql://jovyan@127.0.0.1:5432/imdb_lecture
Example test query:
%%sql
SELECT *
FROM people
WHERE name = 'Michelle Yeoh';
See the slides for most of the queries, which we executed in the interactive psql client.