python3 modelsof.py get_datases jop
Scrapes Dataverse for all articles. Produces out/jop/datasets.csv
with title, href, date, description, keywords
.
python3 modelsof.py get_files jop
Scrapes Dataverse for all files associated with each article in datasets.csv
. Produces out/jop/files.csv
with title, href, date, filename, file_href
.
python3 modelsof.py get_downloads jop [2018]
Downloads all files with ext of .do .7z .7zip .gz .rar .tar .zip
in files.csv
optionally limited by year. Produces out/jop/downloads/{year}/{dataset}/{file}
. Errors logged to out/jop/downloads/errors.csv
with title, href, date, filename, file_href, error
.
python3 modelsof.py unzip jop
Recursively unzips all files with ext of .7z .7zip .gz .rar .tar .zip
in downloads
. Requires 7zip (p7zip-full
and p7zip-rar
on Ubuntu).
python3 modelsof.py get_all_files jop
Union of files.csv
and files in downloads
. Produces out/jop/all_files.csv
with file
.
python3 modelsof.py plot_files
Uses out/**/all_files.csv
to produce distribution counts at out/files_dist.csv
and out/files_by_datasets_dist.csv
, then runs plots.R to produce out/files_dist.png
and out/files_by_datasets_dist.png
(whether a dataset contains a kind of file).
python3 stata.py jop
Parses all .do files in out/jop/downloads
and produces corresponding .do.json at out/jop/results/{year}/{dataset}/{file}
as well as out/jop/files.json
and out/jop/stats.json
.
python3 modelsof.py plot_commands
Uses out/**/stats.json
to produce distribution counts at out/commands_dist.csv
, then runs plots.R to produce out/commands_dist.png
.
Some prefix commands are run in isolation (not as a prefix). They are counted as len_prefix
. Those prefix commands that are used as a prefix to another command are counted as len_prefix_as_prefix
. The latter do not show up in overall counts (len
).
The first item is a count of regression commands in all files. Given two commands:
svy: reg ...
reg ...
the count will be:
svy:reg = 1
reg = 1
The remaining items (counts per file) count prefix and "command" (regression or otherwise) separately except for the 'regressions' key, which works the same as the previous section.
Some files have syntax errors. In the case of missing closing delimiters, they are closed. In the case of missing closing */
, the comment is assumed to extend to the end of the file.