Capitalise all string values in tables #259

olejandro · 2024-12-20T13:47:07Z

No description provided.

olejandro · 2024-12-20T15:29:29Z

@siddharth-krishna what do you think? This simple change gives a very good improvement on TIM!
I'll try to look into why GAMS fails on Demos 09-12.

olejandro · 2024-12-20T15:34:19Z

This PR would make the names of processes, commodities and user constraints look different in the output. Should we consider "undoing" the changes after processing in the future? E.g. we could use the names as they are before capitalising to create a dictionary which we could later use to undo the change in the processed data.

siddharth-krishna

Thanks, Olex. To make sure I understand correctly: this PR increases accuracy and reduces additional rows because the input data is using differently cased versions of the same names/strings, and so some transforms are not matching what they should?

Because if it was just a matter of our output having a different casing compared to the ground truth, I'd have thought #255 would suffice.

If my theory is correct, then it makes sense to normalise the casing early on like we do here. And I suppose we'd want to preserve the casing if the output was using in some visualisation tool like Miro? If so, your approach makes sense, we could have an optional flag to do that, but the question is which option to pick if the input uses more than one variant: e.g. twowords, Twowords, TwoWords?

siddharth-krishna · 2024-12-23T11:52:21Z

xl2times/transforms.py

-    colnames = ["attribute", "tact", "tcap", "unit", "sourcescen"]
-
-    def capitalise_attributes_table(table: EmbeddedXlTable):
+    def capitalise_table_entries(table: EmbeddedXlTable):
        df = table.dataframe.copy()


Not sure if we need to copy the data frame since we modify it inplace below

siddharth-krishna · 2024-12-23T11:52:59Z

xl2times/transforms.py

+                    df.loc[i, seen_col] = df[seen_col][i].str.upper()
+                    df.loc[i, seen_col] = df[seen_col][i].str.strip()


Suggested change

df.loc[i, seen_col] = df[seen_col][i].str.upper()

df.loc[i, seen_col] = df[seen_col][i].str.strip()

df.loc[i, seen_col] = df[seen_col][i].str.upper().str.strip()

Nice, thanks! Sorry, didn't see these while merging from the mobile app. Will add them to one of the open PRs!

olejandro · 2024-12-23T12:35:45Z

Thanks @siddharth-krishna! Yes, that's correct.
To undo the changes, we could pick the variant from the process/commodity declaration tables. If there are duplicates, we could keep the last one.

Capitalise all string values in tables

140736b

olejandro marked this pull request as ready for review December 20, 2024 15:26

olejandro requested a review from siddharth-krishna December 20, 2024 15:27

olejandro added 5 commits December 20, 2024 10:51

Capitalise default units for dummy import processes

418d837

Check for True in the index

e1a55a3

Don't compare set description with GDX diff

bf1bae5

Strip leading and trailing whitespace

8ec7b7e

Make the whitespace removal work

5eadbfc

siddharth-krishna approved these changes Dec 23, 2024

View reviewed changes

olejandro merged commit 8a46fda into main Dec 23, 2024
2 checks passed

olejandro deleted the olex/capitalise-entries branch December 23, 2024 12:37

olejandro added a commit that referenced this pull request Dec 23, 2024

Apply code review suggestions from #259

238e06b

olejandro mentioned this pull request Dec 23, 2024

Preserve capitalisation of the model data in the output #262

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capitalise all string values in tables #259

Capitalise all string values in tables #259

olejandro commented Dec 20, 2024

olejandro commented Dec 20, 2024

olejandro commented Dec 20, 2024

siddharth-krishna left a comment

siddharth-krishna Dec 23, 2024

siddharth-krishna Dec 23, 2024

olejandro Dec 23, 2024

olejandro commented Dec 23, 2024

		df.loc[i, seen_col] = df[seen_col][i].str.upper()
		df.loc[i, seen_col] = df[seen_col][i].str.strip()

Capitalise all string values in tables #259

Capitalise all string values in tables #259

Conversation

olejandro commented Dec 20, 2024

olejandro commented Dec 20, 2024

olejandro commented Dec 20, 2024

siddharth-krishna left a comment

Choose a reason for hiding this comment

siddharth-krishna Dec 23, 2024

Choose a reason for hiding this comment

siddharth-krishna Dec 23, 2024

Choose a reason for hiding this comment

olejandro Dec 23, 2024

Choose a reason for hiding this comment

olejandro commented Dec 23, 2024