Lecture 02: SQL Review¶

First load in the data into the database¶

In [ ]:
!unzip -u data/imdb_lecture.zip -d data/
In [ ]:
!psql -h localhost -c 'DROP DATABASE IF EXISTS imdb_lecture'
!psql -h localhost -c 'CREATE DATABASE imdb_lecture' 
!psql -h localhost -d imdb_lecture -f data/imdb_lecture.sql

Using psql in Terminal¶

psql is a command-line PostgreSQL interactive client.

I find it useful to keep the Terminal up while I'm working on notebooks for the following:

  • meta-commands: psql commands to query information (generally metadata) about the databse
  • writing interactive SQL queries: psql shows me a few rows at a time, and I can quit whenever. Avoids Jupyter notebooks running out of space if the query result relation is huge.

To launch psql and connect to a specific database, say, the imdb_lecture database we just created on localhost, open up a Terminal and type in:

psql postgresql://127.0.0.1:5432/imdb_lecture

Note the postgres server is on localhost (i.e., IP address 127.0.0.0.1) and network port 5432.

Troubleshooting:

  • You do not have to be in a particular directory to launch the psql client!
  • If you cannot connect or you do not see any relations with \d, make sure you have created/loaded in the database with the !psql commands in the previous section.
  • If your interactive query is not executing, check to see if you have ended with a semicolon (necessary and also generally good style!).

Quick reference:

  • \l list all databases available on this server
  • \d list all relations in this database
  • \d tablename list schema of tablename relation
  • \q quit psql
  • \? help
  • <ctrl>-c cancel
  • <ctrl>-a, <ctrl>-e jump to the front and back of a line, respectively
  • <ctrl>-<left>, <ctrl>-<right> jump one word previous and forward, respectively
  • (when in query result buffer) <space> to advance a page, q to quit and exit out

Using jupysql in Jupyter Notebook¶

We are going to be using the jupysql library to connect our notebook to a PostgreSQL database server on your jupyterhub account. The next cell should do the trick; you should not see any error messages after it completes.

In [ ]:
%reload_ext sql

Note we did not do import jupysql (this will throw an error). You should always load jupysql as the sql cell magic, as shown above.


jupysql helps us create a client connection directly from our Notebook. However, just like before, we first need to connect to our database before we start issuing any queries:

In [ ]:
%sql postgresql://jovyan@127.0.0.1:5432/imdb_lecture

Example test query:

In [ ]:
%%sql
SELECT *
FROM people
WHERE name = 'Michelle Yeoh';



See the slides for most of the queries, which we executed in the interactive psql client.