Imports a cleaned dataset and associated data sources, variables, and data points into the MySQL database.
Example usage:
from standard_importer import import_dataset
dataset_dir = "worldbank_wdi"
dataset_namespace = "[email protected]"
import_dataset.main(dataset_dir, dataset_namespace)
import_dataset.main(...)
expects a set of CSV files to exist in {DATASET_DIR}/output/
(e.g. worldbank_wdi/output
):
distinct_countries_standardized.csv
datasets.csv
sources.csv
variables.csv
datapoints/data_points_{VARIABLE_ID}.csv
(onedata_points_{VARIABLE_ID}.csv
file for each variable invariables.csv
)
Inside the dataset directory (e.g. vdem
), data must be located in an output
directory, with the following structure:
(see worldbank_wdi/output for an example)
This file lists all entities present in the data, so that new entities can be created if necessary. Located in output/distinct_countries_standardized.csv
:
name
: name of the entity.
Located in output/datasets.csv
:
id
: temporary dataset ID for loading processname
: name of the Grapher dataset
Located in output/sources.csv
:
id
: temporary source ID for loading processname
: name of the sourcedescription
: JSON object withdataPublishedBy
(string),dataPublisherSource
(string),link
(string),retrievedDate
(string),additionalInfo
(string)dataset_id
: foreign key matching each source with a dataset ID
Located in output/variables.csv
:
dataset_id
: foreign key matching each variable with a dataset IDsource_id
: foreign key matching each variable with a source IDid
: temporary variable ID for loading processname
: name of the variabledescription
: long description of the variablecode
: original variable code used by the data sourceunit
: unit of measurementshort_unit
: short unit of measurement, for chart axis displaytimespan
: timespan covered by the variablecoverage
: type of geographical coveragedisplay
: JSON object that defines how the variable should be displayedoriginal_metadata
: JSON object representing original uncleaned metadata from the data source
Located in output/datapoints/datapoints_{VARIABLE_ID}.csv
:
{VARIABLE_ID}
in the file name is a foreign key matching values with a temporary variable ID invariables.csv
country
: location of the observationyear
: year of the observationvalue
: value of the observation