Data 101: Data Engineering 💾
UC Berkeley, Fall 2024
Ed Lecture Recordings Gradescope Additional Extensions
Announcements
Schedule
Week 01
- Thu 8/29
-
- Discussion 1 SQL Review
- Solution, Code
- Friday 8/30
-
- Project 0 SQL Review
- Due
Wed 9/4Thu 9/5, 5pm
Notes
Week 02
- Tue 9/3
- Lecture 2 SQL Review
- Wed 9/4
- Project 0 Due Date Extended by 24 hours (Ed post)
- Thu 9/5
- Lecture 3 Relational Model & Algebra
-
- Discussion 2 Relational Algebra
- Solution
- Project 0 Due, 5pm
- Fri 9/6
-
- Homework 1 Homework 1
- Due Wed 9/18, 5pm
Week 03
- Tue 9/10
- Lecture 4 CTEs, Views, Subqueries, Foreign Keys
- Thu 9/12
- Lecture 5 DML/DDL, Keys, and Constraints
-
- Discussion 3 CTEs, Subqueries, Views, DML/DDL
- Solution
- Fri 9/13
-
- Project 1 SQL
- Due Wed 9/25, 5pm
Week 04
- Tue 9/17
- Lecture 6 Performance Tuning, Index Selection
- Wed 9/18
- Homework 1 Due, 5pm
- Thu 9/19
- Lecture 7 Query Plan Selection I
-
- Discussion 4 Cascade, Query Performance I
- Solution, Code
- Fri 9/20
-
- Homework 2 Homework 2
- Due Wed 10/2, 5pm
Week 05
- Tue 9/24
- Lecture 8 Query Plan Selection II
- Wed 9/25
- Project 1 Due, 5pm
- Thu 9/26
- Lecture 9 Query Plan Selection III, Data Modeling I
-
- Discussion 5 Query Optimization II, Data Models
- Solution, Code
- Fri 9/27
-
- Project 2 Query Performance
- Due Wed 10/9, 5pm
Week 06
- Tue 10/1
- Lecture 10 Data Preparation I: Structural
- Wed 10/2
- Homework 2 Due, 5pm
- Thu 10/3
- Lecture 11 Data Preparation II: Window Functions, Numerical, Granularity
-
- Discussion 6 Window Functions & Data Prep I
- Solution, Code
Week 07
- Tue 10/8
- Lecture 12 Data Preparation III: Outliers
- Wed 10/9
- Project 2 Due, 5pm
-
- Homework 3 Homework 3
- Due 10/23, 5pm
- Thu 10/10
- Lecture 13 SQL Review II
-
- Discussion 7 Data Granularity & SQL Review II
- Solution, Code
Week 08
- Tue 10/15
- Wed 10/16
- Midterm Midterm Exam (7-9pm)
- Thu 10/17
- Lecture (no class)
- Discussion (no discussion)
Week 09
- Tue 10/22
- Lecture 15 Data Modeling II: Normalization + ER Diagrams
- Wed 10/23
- Homework 3 Due, 5pm
- Thu 10/24
- Lecture 16 Wrap-Up ER Diagrams & Semistructured Data
-
- Discussion 9 Hampel X84, Entity Resolution, ERD
- Solution, Code
- Fri 10/25
-
- Project 3 Data Transformation
- Due Fri 11/8, 5pm
Week 10
- Tue 10/29
- Lecture 17 MongoDB I (& Semistructured Data)
- Thu 10/31
- Lecture 18 MongoDB II
-
- Discussion 10 Normalization, MongoDB, Midterm Review
- Solution
- Fri 11/1
-
- Homework 4 Homework 4
- Due Wed 11/13, 5pm
Week 11
- Tue 11/5
- Election Day Lecture 19 Data Ops and Pipelines
- Thu 11/7
- Lecture 20 MapReduce, Sampling
-
- Discussion 11 Data Ops, MapReduce, MQL II
- Solution
- Fri 11/8
- Project 3 Due, 5pm
-
- Project 4 Mongo
- Due Fri 11/22, 5pm
-
- Final Project Optional Final Project
- Checkpoint Mon 11/25, 5pm
Due Mon 12/9, 5pm
Week 12
- Tue 11/12
- Lecture 21 Transactions
- Wed 11/13
- Homework 4 Due, 5pm
- Thu 11/14
- Lecture 22 BI/OLAP
-
- Discussion 12 Reservoir Sampling, Transactions
- Solution, Code
Week 13
- Tue 11/19
- Lecture 23 Spreadsheets
- Wed 11/20
- Project 4 deadline extended to Fri 11/22 (Ed post)
- Thu 11/21
- Lecture 24 Parallel and Distributed Computing
-
- Discussion 13 Data Cubes, OLAP, Parallel Processing
- Solution
- Fri 11/22
- Project 4 Due, 5pm
-
- Homework 5 Homework 5
- Due Wed 12/4, 5pm
Week 14
- Mon 11/25
-
- Final Project Checkpoint Due, 5pm
- Checkpoint Submission [Group]
Checkpoint Peer Assessment [Individual]
- Tue 11/26
- Lecture 25 Graph Databases and Knowledge Bases
- Thu 11/28
- Thanksgiving Day No class
Week 15
- Tue 12/3
-
- Lecture 26 [Guest] DataHub
- End-of-Semester Form
- Wed 12/4
- Homework 5 Due, 5pm
- Thu 12/5
- Lecture 27 Guest Lecture Peter Sujan, Closing Thoughts
-
- Discussion 14 Parallel Processing, Spreadsheets, and Graph DB
- Demo, Solution
RRR Week
- Tue 12/10
- Final Review
- Wed 12/11
-
- Final Project Project Spec Due, 5pm
- Final Submission [Group]
Peer Assessment [Individual]
- Thu 12/12
- Final Review
Finals Week
- Tue 12/17
- Final Final Exam (3-6pm)