Data 101: Data Engineering 💾

UC Berkeley, Fall 2024

Ed Lecture Recordings Gradescope Additional Extensions

Professor Lisa Yan

She/Her

yanlisa@berkeley.edu

Professor Michael Ball

He/Him

ball@berkeley.edu

Announcements

Week 12 Announcement

Nov 12

Homework 4 is due this Wednesday! Project 4 and the optional Project 5 are both released as well.

See Week 12 Ed announcement.

Schedule

Week 01

Thu 8/29
Lecture 1 Introduction, Data Engineering Lifecycle
Pre-Semester Form
Discussion 1 SQL Review
Solution, Code
Friday 8/30
Project 0 SQL Review
Due Wed 9/4 Thu 9/5, 5pm
Notes

Week 02

Tue 9/3
Lecture 2 SQL Review
Wed 9/4
Project 0 Due Date Extended by 24 hours (Ed post)
Thu 9/5
Lecture 3 Relational Model & Algebra
Discussion 2 Relational Algebra
Solution
Project 0 Due, 5pm
Fri 9/6
Homework 1 Homework 1
Due Wed 9/18, 5pm

Week 03

Tue 9/10
Lecture 4 CTEs, Views, Subqueries, Foreign Keys
Thu 9/12
Lecture 5 DML/DDL, Keys, and Constraints
Discussion 3 CTEs, Subqueries, Views, DML/DDL
Solution
Fri 9/13
Project 1 SQL
Due Wed 9/25, 5pm

Week 04

Tue 9/17
Lecture 6 Performance Tuning, Index Selection
Wed 9/18
Homework 1 Due, 5pm
Thu 9/19
Lecture 7 Query Plan Selection I
Discussion 4 Cascade, Query Performance I
Solution, Code
Fri 9/20
Homework 2 Homework 2
Due Wed 10/2, 5pm

Week 05

Tue 9/24
Lecture 8 Query Plan Selection II
Wed 9/25
Project 1 Due, 5pm
Thu 9/26
Lecture 9 Query Plan Selection III, Data Modeling I
Discussion 5 Query Optimization II, Data Models
Solution, Code
Fri 9/27
Project 2 Query Performance
Due Wed 10/9, 5pm

Week 06

Week 07

Tue 10/8
Lecture 12 Data Preparation III: Outliers
Wed 10/9
Project 2 Due, 5pm
Homework 3 Homework 3
Due 10/23, 5pm
Thu 10/10
Lecture 13 SQL Review II
Discussion 7 Data Granularity & SQL Review II
Solution, Code

Week 08

Tue 10/15
Lecture 14 Data Preparation IV: Outliers, Imputation
Wed 10/16
Midterm Midterm Exam (7-9pm)
Thu 10/17
Lecture (no class)
Discussion (no discussion)

Week 09

Tue 10/22
Lecture 15 Data Modeling II: Normalization + ER Diagrams
Wed 10/23
Homework 3 Due, 5pm
Thu 10/24
Lecture 16 Wrap-Up ER Diagrams & Semistructured Data
Discussion 9 Hampel X84, Entity Resolution, ERD
Solution, Code
Fri 10/25
Project 3 Data Transformation
Due Fri 11/8, 5pm

Week 10

Tue 10/29
Lecture 17 MongoDB I (& Semistructured Data)
Thu 10/31
Lecture 18 MongoDB II
Discussion 10 Normalization, MongoDB, Midterm Review
Solution
Fri 11/1
Homework 4 Homework 4
Due Wed 11/13, 5pm

Week 11

Tue 11/5
Election Day Lecture 19 Data Ops and Pipelines
Thu 11/7
Lecture 20 MapReduce, Sampling
Discussion 11 Data Ops, MapReduce, MQL II
Solution
Fri 11/8
Project 3 Due, 5pm
Project 4 Mongo
Due Wed 11/20, 5pm
Final Project Optional Final Project
Checkpoint Mon 11/25, 5pm
Due Mon 12/9, 5pm

Week 12

Tue 11/12
Lecture 21 Transactions
Wed 11/13
Homework 4 Due, 5pm
Thu 11/14
Lecture 22 BI/OLAP
Discussion 12 Reservoir Sampling, Transactions
Code

Week 13

Tue 11/19
Lecture 23 Graph Databases and Knowledge Bases
Wed 11/20
Project 4 Due, 5pm
Thu 11/21
Lecture 24 Parallel and Distributed Computing
Discussion 13 OLAP, Parallel/Distributed Computing
Fri 11/22
Homework 5 Homework 5
Due Wed 12/4, 5pm

Week 14

Mon 11/25
Final Project Checkpoint Due, 5pm
Tue 11/26
Lecture 25 Spreadsheets
Thu 11/28
Thanksgiving Day No class

Week 15

Tue 12/3
Lecture 26 [Guest] DataHub
Wed 12/4
Homework 5 Due, 5pm
Thu 12/5
Lecture 27 Guest Lecture, Closing Thoughts
Discussion 14 Wrap-Up

RRR Week

Mon 12/9
Final Project Due, 5pm
Tue 12/10
Final Review
Thu 12/12
Final Review

Finals Week

Tue 12/17
Final Final Exam (3-6pm)