Data Engineering
UC Berkeley, Spring 2021
The schedule and dates listed below are tentative and may be subject to change. Check out the syllabus for course information.
Week |
Date |
Lecture |
Assignment |
1 |
Tu 1/19 |
1. Introduction & Data Science Lifecycle |
|
|
Th 1/21 |
2. Logistics and Relational model & algebra |
|
2 |
Tu 1/26 |
3. Relational alg. contd. SQL intro |
|
|
Th 1/28 |
4. SQL intro, Views |
|
3 |
Tu 2/2 |
5. SQL subqueries and aggregation |
|
|
Th 2/4 |
6. More SQL: window functions, sampling, string manipulation |
Project 1 (due 2/19) |
4 |
Tu 2/9 |
7. SQL updates, DDL, referential integrity, constraints |
|
|
Th 2/11 |
8. Index selection and performance tuning |
|
5 |
Tu 2/16 |
9. Index selection and performance tuning (II) |
|
|
Th 2/18 |
10. Index selection and performance tuning (III); 11. Three Data Models: Relations, Tensors and Dataframes |
Multivitamin 1 (due 3/4) |
6 |
Tu 2/23 |
11. Relations, Tensors and Dataframes, Cont.. |
|
|
Th 2/25 |
12. Data Preparation Full notebook and Slide-oriented notebook |
Project 2 (due 3/12) |
7 |
Tu 3/2 |
12b. Data Preparation Slide-oriented notebook |
|
|
Th 3/4 |
12b. Data Preparation, cont. |
|
8 |
Tu 3/9 |
13. Data Cleaning Slide-oriented notebook |
Multivitamin 2 (due 3/17) |
|
Th 3/11 |
Contd. |
|
9 |
Tu 3/16 |
14. Normalization and ER |
|
|
Th 3/18 |
15. Semistructured Data |
Multivitamin 3 (due 3/31) |
|
Tu 3/23 |
Spring Break |
|
|
Th 3/25 |
Spring Break |
|
10 |
Tu 3/30 |
16. Querying semistructured data |
Project 3 (due 4/15) |
|
Th 4/1 |
Contd. |
|
11 |
Tu 4/6 |
17. Spreadsheets |
|
|
Th 4/8 |
18. Graph data: Property graph models, triples/RDF 19. BI: OLAP, summarization, and visualization |
Multivitamin 4 (due 4/19) |
12 |
Tu 4/13 |
20. Transactions |
|
|
Th 4/15 |
21. Data Pipelines |
|
13 |
Tu 4/20 |
22. Approximation: sampling and sketching |
Project 4 (due 5/4) |
|
Th 4/22 |
23. Storage: Column vs. row, Compression, Exchange formats |
|
14 |
Tu 4/27 |
24. Parallelization: Map-Reduce, Spark, Parallel DBMS, Dask/Modin |
Multivitamin 5 (due 5/3 at 10 AM) |
|
Th 4/29 |
25. Security and Privacy. 26. Reflections |
Project 5 (due 5/14) |
15 |
Tu 5/4 |
RRR Week |
|
|
Th 5/6 |
RRR Week |
|