Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the processing order of tfm tables #251

Merged
merged 9 commits into from
Dec 14, 2024
16 changes: 15 additions & 1 deletion xl2times/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from xl2times.utils import max_workers

from . import excel, transforms, utils
from .datatypes import Config, EmbeddedXlTable, TimesModel
from .datatypes import Config, DataModule, EmbeddedXlTable, TimesModel

logger = utils.get_logger()

Expand Down Expand Up @@ -485,6 +485,20 @@ def run(args: argparse.Namespace) -> str | None:

model.files.update([Path(path).stem for path in input_files])

processing_order = ["base", "subres", "trade", "demand", "scen", "syssettings"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be defined inside the DataModule class? Just so that if we add another enum value to it, we might be reminded to also add it to this list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea! Would you be up for including this in the refactoring of the class in a subsequent PR or do you think we should change it already?

for data_module in processing_order:
model.data_modules = model.data_modules + sorted(
[
item
for item in {
DataModule.module_name(path)
for path in input_files
if DataModule.module_type(path) == data_module
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we raise a warning/error if there are any files in input_files that don't match this if condition for any data_module in processing_order? I'm wondering if there might be a bug in the future where we add a new item to DataModule but forget to add it to processing_order, and then we skip processing some input files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

}
if item is not None
]
)

if args.only_read:
tables = convert_xl_to_times(
input_files,
Expand Down
11 changes: 7 additions & 4 deletions xl2times/datatypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,13 +119,15 @@ def module_name(cls, path: str) -> str | None:
module_type = cls.determine_type(path)
match module_type:
case DataModule.base | DataModule.sets | DataModule.lma | DataModule.demand | DataModule.trade | DataModule.syssettings:
return module_type.name
return module_type.name.upper()
case DataModule.subres:
return re.sub("_trans$", "", PurePath(path).stem.lower())
return re.sub(
"^SUBRES_", "", re.sub("_TRANS$", "", PurePath(path).stem.upper())
)
case DataModule.scen:
return re.sub("^SCEN_", "", PurePath(path).stem.upper())
case None:
return None
case _:
return PurePath(path).stem


@dataclass
Expand Down Expand Up @@ -234,6 +236,7 @@ class TimesModel:
units: DataFrame = field(default_factory=DataFrame)
start_year: int = field(default_factory=int)
files: set[str] = field(default_factory=set)
data_modules: list[str] = field(default_factory=list)

@property
def external_regions(self) -> set[str]:
Expand Down
Loading
Loading