You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reweighting outputs 2 columns: UPRN and UPRN_right.
It seems to come from the reweighting pipeline because it already exists in the dataset that we load in at the beginning of run_add_features.py. I think it originates from the outer join we use to join the weights to the epc_df. This is an expected polars behaviour that I forgot to account for.
This means, rows which have nulls in UPRN_right should be missing weights. They will either be missing weights because:
They are in an LSOA which was skipped because we are missing census data for it (mainly affects Scotland, but also England and Wales in some instances)
We had to drop the row for weighting because the row had a category which was not found in the target data for that LSOA. E.g. if the property type of the row is 'flat' but the census has 0% flats for that LSOA, the row will be dropped.
They are in Scotland. This dataset was run before reweighting for Scotland was added, so all Scottish rows should be missing a weight.
Here is the count of rows with missing UPRN_right for each country:
Reweighting outputs 2 columns: UPRN and UPRN_right.
It seems to come from the reweighting pipeline because it already exists in the dataset that we load in at the beginning of
run_add_features.py
. I think it originates from the outer join we use to join the weights to theepc_df
. This is an expectedpolars
behaviour that I forgot to account for.This means, rows which have nulls in
UPRN_right
should be missing weights. They will either be missing weights because:Here is the count of rows with missing
UPRN_right
for each country:All of them have null weights, as expected:
Originally posted by @crispy-wonton in #70 (comment)
The text was updated successfully, but these errors were encountered: