This directory includes a few odds and ends:
Jeopardy Data
jc1.txtis a dataset of Jeopardy Contestants crawled from j-archive.com and posted on Reddit.jq2.txtis a dataset Jeopardy Questions also crawled from j-archive.com
California/Nevada Precipitation Data
simple_scrape.pyis a Python script to crawl data from the NOAA Website for the California-Nevada River Forecast Center, e.g. pages like https://www.cnrfc.noaa.gov/monthly_precip_2020.php). If you run it, it will create a subdirectory calledoutputand store each year’s data there.monthly_precip_full.csvis the concatenation of the outputs ofsimple_scrape.pyflow_CalDataEngExample.zipis a Trifacta flow export file. It contains all the recipes to take themonthly_precip_full.csvfile and generate the remaining files:mm.txt,mmp.txt,mmr.txtandmpf.txt. This is not a human-readable format—to make use of it, you need to go to Flows->Import Flow in Trifacta as described here.mm.txtis a pivot table (matrix) ofYearxMonth.mmp.txtis a pivot table of(Year,ID,Location,Station)xMonthmmr.txtis a un-pivoted (relational) version ofmpf.txtmpf.txtis a cleaned-up version ofmonthly_precip_full.csvwith all the display junk from the web stripped out and the relevant fields replicated into each row.