Refactor: improve the data file structure

open-innovations · Aug 7, 2024 · fb21452 · fb21452
1 parent 2d9e938
commit fb21452
Show file tree

Hide file tree

Showing 27 changed files with 62,279 additions and 63,449 deletions.
diff --git a/data/README.md b/data/README.md
@@ -0,0 +1,7 @@
+Data files are grouped by their topic / dataset e.g. Affordable homes.
+Each topic contains two directories: `site` and `standard`.
+In `standard`, data are stored in a standardised format. These always include the `geography_code`, `geography_name`, `date`, `Measure` and `value` columns. These files are used to generate metadata and for manually checking what is in the file, if needed.
+In `site`, data are stored in `parquet` files in the correct shape they need to be in to power a visualisation. This is usally a wide (or pivoted) version of the `standard` files.
+In some cases, for example a `headlines.csv` file, these arae in a unique format to drive a particular visualisation type, e.g. an OI Lume `dashboard`.
+
+Any questions, suggestions, or improvements - let me know!
diff --git a/data/affordable-homes/by_tenure.csv b/data/affordable-homes/by_tenure.csv
diff --git a/data/affordable-homes/by_tenure.parquet → data/affordable-homes/site/by_tenure.parquet b/data/affordable-homes/by_tenure.parquet → data/affordable-homes/site/by_tenure.parquet
diff --git a/data/affordable-homes/standard/by_tenure.csv b/data/affordable-homes/standard/by_tenure.csv