⚠️ This content is archived as of March 2026 and is retained exclusively for reference. Find current offerings.
Data Engineering
UC Berkeley, Spring 2021
The schedule and dates listed below are tentative and may be subject to change. Check out the syllabus for course information.
| Week |
Date |
Lecture |
Assignment |
| 1 |
Tu 1/19 |
1. Introduction & Data Science Lifecycle |
|
| |
Th 1/21 |
2. Logistics and Relational model & algebra |
|
| 2 |
Tu 1/26 |
3. Relational alg. contd. SQL intro |
|
| |
Th 1/28 |
4. SQL intro, Views |
|
| 3 |
Tu 2/2 |
5. SQL subqueries and aggregation |
|
| |
Th 2/4 |
6. More SQL: window functions, sampling, string manipulation |
Project 1 (due 2/19) |
| 4 |
Tu 2/9 |
7. SQL updates, DDL, referential integrity, constraints |
|
| |
Th 2/11 |
8. Index selection and performance tuning |
|
| 5 |
Tu 2/16 |
9. Index selection and performance tuning (II) |
|
| |
Th 2/18 |
10. Index selection and performance tuning (III); 11. Three Data Models: Relations, Tensors and Dataframes |
Multivitamin 1 (due 3/4) |
| 6 |
Tu 2/23 |
11. Relations, Tensors and Dataframes, Cont.. |
|
| |
Th 2/25 |
12. Data Preparation Full notebook and Slide-oriented notebook |
Project 2 (due 3/12) |
| 7 |
Tu 3/2 |
12b. Data Preparation Slide-oriented notebook |
|
| |
Th 3/4 |
12b. Data Preparation, cont. |
|
| 8 |
Tu 3/9 |
13. Data Cleaning Slide-oriented notebook |
Multivitamin 2 (due 3/17) |
| |
Th 3/11 |
Contd. |
|
| 9 |
Tu 3/16 |
14. Normalization and ER |
|
| |
Th 3/18 |
15. Semistructured Data |
Multivitamin 3 (due 3/31) |
| |
Tu 3/23 |
Spring Break |
|
| |
Th 3/25 |
Spring Break |
|
| 10 |
Tu 3/30 |
16. Querying semistructured data |
Project 3 (due 4/15) |
| |
Th 4/1 |
Contd. |
|
| 11 |
Tu 4/6 |
17. Spreadsheets |
|
| |
Th 4/8 |
18. Graph data: Property graph models, triples/RDF 19. BI: OLAP, summarization, and visualization |
Multivitamin 4 (due 4/19) |
| 12 |
Tu 4/13 |
20. Transactions |
|
| |
Th 4/15 |
21. Data Pipelines |
|
| 13 |
Tu 4/20 |
22. Approximation: sampling and sketching |
Project 4 (due 5/4) |
| |
Th 4/22 |
23. Storage: Column vs. row, Compression, Exchange formats |
|
| 14 |
Tu 4/27 |
24. Parallelization: Map-Reduce, Spark, Parallel DBMS, Dask/Modin |
Multivitamin 5 (due 5/3 at 10 AM) |
| |
Th 4/29 |
25. Security and Privacy. 26. Reflections |
Project 5 (due 5/14) |
| 15 |
Tu 5/4 |
RRR Week |
|
| |
Th 5/6 |
RRR Week |
|