Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code expanding/exploding regions into a single utility function #206

Open
siddharth-krishna opened this issue Mar 1, 2024 · 0 comments

Comments

@siddharth-krishna
Copy link
Collaborator

There's code in generate_uc_properties that expands allregions and comma-separated region lists:

# TODO: Can this (until user_constraints.explode) become a utility function?
# Handle allregions by substituting it with a list of internal regions
index = user_constraints["region"].str.lower() == "allregions"
if any(index):
user_constraints["region"][index] = [model.internal_regions]
# Handle comma-separated regions
index = user_constraints["region"].str.contains(",").fillna(value=False)
if any(index):
user_constraints["region"][index] = user_constraints.apply(
lambda row: [
region
for region in str(row["region"]).split(",")
if region in model.internal_regions
],
axis=1,
)
# Explode regions
user_constraints = user_constraints.explode("region", ignore_index=True)

which is very similar to code in process_transform_tables:
# Handle Regions:
if set(df.columns).isdisjoint(
{x.lower() for x in regions} | {"allregions"}
):
if "region" not in df.columns:
# If there's no region information at all, this table is for all regions:
df["region"] = ["allregions"] * len(df)
# Else, we only have a "region" column so handle it below
else:
if "region" in df.columns:
raise ValueError(
"ERROR: table has a column called region as well as columns with"
f" region names:\n{table}\n{df.columns}"
)
# We have columns whose names are regions, so gather them into a "region" column:
region_cols = [
col_name
for col_name in df.columns
if col_name in set([x.lower() for x in regions]) | {"allregions"}
]
other_columns = [
col_name for col_name in df.columns if col_name not in region_cols
]
df = pd.melt(
df,
id_vars=other_columns,
var_name="region",
value_name="value",
ignore_index=False,
)
df = df.sort_index().reset_index(drop=True) # retain original row order
# This expands "allregions" into one row for each region:
df["region"] = df["region"].map(
lambda x: regions if x == "allregions" else x
)
df = df.explode(["region"])
df["region"] = df["region"].str.upper()

and there's also an explode function in utils.py.

It would be good to have all the code exploding regions in one place, both for code reuse and conciseness but also so that optimizations are applied everywhere.

(Link to original discussion: https://github.com/etsap-TIMES/xl2times/pull/179/files/4ea76267c9558b3a08d09ec282b7a5fcaa458f8c#r1487242195)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant