This directory includes a few odds and ends:
Jeopardy Data
jc1.txt
is a dataset of Jeopardy Contestants crawled from j-archive.com and posted on Reddit.jq2.txt
is a dataset Jeopardy Questions also crawled from j-archive.com
California/Nevada Precipitation Data
simple_scrape.py
is a Python script to crawl data from the NOAA Website for the California-Nevada River Forecast Center, e.g. pages like https://www.cnrfc.noaa.gov/monthly_precip_2020.php). If you run it, it will create a subdirectory calledoutput
and store each year’s data there.monthly_precip_full.csv
is the concatenation of the outputs ofsimple_scrape.py
flow_CalDataEngExample.zip
is a Trifacta flow export file. It contains all the recipes to take themonthly_precip_full.csv
file and generate the remaining files:mm.txt
,mmp.txt
,mmr.txt
andmpf.txt
. This is not a human-readable format—to make use of it, you need to go to Flows->Import Flow in Trifacta as described here.mm.txt
is a pivot table (matrix) ofYear
xMonth
.mmp.txt
is a pivot table of(Year,ID,Location,Station)
xMonth
mmr.txt
is a un-pivoted (relational) version ofmpf.txt
mpf.txt
is a cleaned-up version ofmonthly_precip_full.csv
with all the display junk from the web stripped out and the relevant fields replicated into each row.