Data 101/Info 258: Data Engineering

UC Berkeley, Spring 2024

FAQ Ed Datahub Lecture Recordings Gradescope Extenuating Circumstances Regrade requests

Professor Aditya Parameswaran

Professor Aditya Parameswaran

He/Him/His

adityagp@berkeley.edu

Course contact: data101@berkeley.edu

Schedule

Jump to current week

Week 01

Wed 1/17
No Class (Aditya at a conference)

Week 02

Mon 1/22
Lecture 1 Introduction, Data Engineering Lifeycle
Wed 1/24
Lecture 2 SQL Review
code, code HTML
Thu 1/25
Discussion 1 SQL Review
Solution

Week 03

Mon 1/29
Lecture 3 Relational Model and Algebra I
Wed 1/31
Lecture 4 Relational Model and Algebra II; Writing SQL results out
Thu 2/1
Discussion Relational Algebra and Views
Solution
Project 1 SQL
Due Wed 2/14, 5pm

Week 04

Week 05

Mon 2/12
Lecture 7 DML/DDL (see last lecture slides); Performance Tuning
Index Selection Code +HTML
Wed 2/14
Lecture 8 Index Selection; Query processing and Optimization I
Index Selection Demo (see last lecture); Query processing code +HTML
Project 1 Due, 5pm
Thu 2/15
Discussion No Discussion
Project 2 Query Performance
Due Wed 2/28, 5pm

Week 06

Mon 2/19
No Class (President’s Day)
Wed 2/21
Lecture 9 Query Processing and Optimization II
Thu 2/22
Discussion Query Optimization
Solution
Multivitamin 2 Multivitamin 2 Release
Due Wed 3/6, 5pm
Fri 2/23
MultiVitamin 1 Due, 5pm (Extended)

Week 07

Mon 2/26
Lecture 10 Query processing and Optimization III
Query processing - single table (see two lectures ago); Query processing - multiple tables +HTML
Wed 2/28
Lecture 11 Query optimization wrap-up
Query processing - multiple tables (see last lecture); Data models +HTML
Thu 2/29
Discussion PostgreSQL Exercises
Solution, Code, Code Solution

Week 08

Mon 3/4
Lecture 12 Data Models: Relations, Tensors, Dataframes; OLAP, Summarization, Window functions
Wed 3/6
Lecture 13 OLAP and Window functions (continued); Data Preparation 1
Data unboxing demo +HTML; Window functions demo +HTML
Multivitamin 2 Due, 5pm
Project 2 Due, 5pm (Extended)
Thu 3/7
Discussion Data Preparation and Pivoting
Solution, Code

Week 09

Mon 3/11
Lecture 14 Data Prep 1 (Contd.); Data Preparation 2
Data prep with GNIS +HTML
MultiVitamin 3 Multivitamin 3 Release
Due Fri 3/22, 5pm
Wed 3/13
Lecture 15 Data Cleaning I
Data cleaning with outliers +HTML
Project 5 Project 5 Release
Due Wed 4/17, 5pm
Thu 3/14
Discussion MDL & Data Preparation
Solution
Project 3 Project 3 Release
Due Wed 4/3, 5pm

Week 10

Mon 3/18
Lecture 16 Data Cleaning II
Wed 3/20
Lecture 17 Normalization and ER
Thu 3/21
Discussion Hampel x84 and Entity Resolution
Solution, Code
Multivitamin 4 Multivitamin 4 Release
Due Wed 4/10, 5pm
Fri 3/22
Multivitamin 3 Due, 5pm

Week 11

Mon 3/25
No Class (Spring Break)
Wed 3/27
No Class (Spring Break)
Thu 3/28
Multivitamin 5 Multivitamin 5 Released
Due Wed 4/17, 5pm

Week 12

Mon 4/1
Lecture 18 Canceled
Wed 4/3
Lecture 19 ER (Contd.); Semistructured Data; MongoDB I
MongoDB I Demo +HTML
Project 3 Due, 5pm
Thu 4/4
Discussion ERD, Normalization, & Semistructured Data
Solution
Project 4 Project 4 Released
Due Wed 4/24, 5pm

Week 13

Mon 4/8
Lecture 20 MongoDB I (Contd.); MongoDB II
Mongo I Demo (Contd.); MongoDB II Demo+HTML
Wed 4/10
Lecture 21 MongoDB II (Contd.); Transactions
Multivitamin 4 Due, 5pm
Thu 4/11
Discussion MongoDB Operations; ERD Review
Solution

Week 14

Mon 4/15
Lecture 22 Parallel and Distributed Computing
Wed 4/17
Lecture 23 Data Pipelines; Spreadsheets
Project 5 Due, 5pm
Multivitamin 5 Extended to 4/22 5pm
Thu 4/18
Discussion Transactions, Parallelization
Transactions Solution, Parallelization Solution

Week 15

Mon 4/22
Lecture 24 Spreadsheets (contd.); Graph Databases
Wed 4/24
Lecture 25 Security and Privacy
Project 4 Due, 5pm
Thu 4/25
Discussion MapReduce, Data Cubes, and OLAP Review
Solution

RRR Week

Mon 4/29
RRR Week (no class)
Wed 5/1
RRR Week (no class)

Finals Week

Tue 5/7
Final Exam Cumulative Final Exam (11.30am-2.30pm)