dd_to_csv bugfixes, move cache to `~/.cache/xl2times/` #230

SamRWest · 2024-03-21T05:23:45Z

Just a minor bugfix and some housekeeping.

Fixes a bug in dd_to_csv.py where DD variable names containing spaces don't parse correctly.
Made dd_to_csv a second script in pyproject.toml (we'd like to use it from the austimes repo, and couldn't easily otherwise)
Moved pickle cache to ~/.cache/xl2times
Adds cache invalidation, to delete old cache files (just anything >365 days, couldn't think of a better way)

…te names. corrected the gams-cat for UC_DYNBND (see #218)

moved cache dir to standard ~/.cache/xl2times dir to avoid creating inside .venv subdir

SamRWest · 2024-03-21T05:26:41Z

xl2times/dd_to_csv.py

                else:
-                    raise ValueError(
-                        f"Unexpected number of spaces in parameter value setting: {data[index]}"
-                    )


This ValueError was getting thrown from veda-produced austimes DD files for a variable with a space in its name

@SamRWest when you write "variable" do you mean an index of a GAMS parameter or a GAMS parameter?

Sorry, 'variable' is the wrong terminology.

I mean the key part referred to in this comment:
# Either "value" for a scalar, or "key value" for an array.

For example, here's a line from one of our DD files that (i think) would have triggered the ValueError, because of the spaces in UC_non-decreasing EE penetration:

'UC_non-decreasing EE penetration'.RHS.'QLD'.2015.'EE2_Pri_Pub-b'.ANNUAL 1

I see, thanks! It should be okay not to raise ValueError, as long as the key is in quotes. Scalar is a numeric, so there should not be spaces in it.

Cool, that was my assumption.
The new code just splits the line at the last space char, whereas the old code split it at all spaces, then warned if it ended up with >2 tokens, which failed on my example string above.

Btw, sometimes there is no value, e.g. from Demo 1:

SET COM_TMAP / 'REG1'.'DEM'.'TPSCOA' 'REG1'.'NRG'.'COA' /;

Let's say somebody does this:

SET COM_TMAP / 'REG1'.'DEM'.'TPS COA' 'REG1'.'NRG'.'COA' /;

What will happen?

Ah, I can see we are handling this above when we check whether it is a parameter.

Yep, the new code will catch this with this check, because rfind returns -1 if no spaces are found in the string.

if split_point == -1: #if only one word attributes, value = [], line

SamRWest · 2024-03-21T05:29:07Z

xl2times/dd_to_csv.py

+                # So value is always the last word, or only token
+                split_point = line.rfind(" ")
+                if split_point == -1:
+                    # if only one word
+                    attributes, value = [], line
                else:
-                    raise ValueError(
-                        f"Unexpected number of spaces in parameter value setting: {data[index]}"
-                    )
+                    attributes, value = line[:split_point], line[split_point + 1 :]
+                    attributes = attributes.split(".")
+                    attributes = [a if " " in a else a.strip("'") for a in attributes]


The rfind() works more reliably, on the assumption that there's always a space before the value, so it'll work with 'key with spaces' value style strings

moved cache dir to standard ~/.cache/xl2times dir to avoid creating inside .venv subdir

…mw/dd_convert_bugfixes

…amw/dd_convert_bugfixes # Conflicts: # xl2times/__main__.py # xl2times/transforms.py

siddharth-krishna · 2024-03-21T05:40:50Z

xl2times/__main__.py

    """
    with open(filename, "rb") as f:
        digest = hashlib.file_digest(f, "sha256")  # pyright: ignore
    hsh = digest.hexdigest()
-    if os.path.isfile(cache_dir + hsh):
-        fname1, _timestamp, tables = pickle.load(open(cache_dir + hsh, "rb"))
+    hash_file = cache_dir / f"{Path(filename).stem}_{hsh}.pkl"


If the cache filename contains the original filename, then perhaps we no longer need to store a tuple in the pickle? And as discussed we probably don't need to check modified timestamp, since it's super unlikely that files get modified but their hash remains the same..

cleaned up caching code as suggested

SamRWest · 2024-03-21T06:00:10Z

xl2times/__main__.py

+            pickle.dump(tables, f)
+        logger.info(f"Saved cache for {filename} to {cached_file}")
+
+    return tables


How about this @siddharth-krishna ?

Looks good, thanks!

SamRWest · 2024-03-21T06:07:23Z

Hmmm, benchmarks are failing to find a ground-truth CSV for demo7. Gotta split, but will have a look in the morning.
Any ideas in the meantime guys?
Error is here: https://github.com/etsap-TIMES/xl2times/actions/runs/8370370796/job/22917616560?pr=230#step:14:3255

 File "/home/runner/work/xl2times/xl2times/xl2times/xl2times/__main__.py", line 509, in run
    ground_truth = read_csv_tables(args.ground_truth_dir)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ...
FileNotFoundError: [Errno 2] No such file or directory: 'benchmarks/csv/DemoS_007-all/benchmarks/csv/DemoS_007-all/TS_MAP.csv'

olejandro · 2024-03-21T13:16:14Z

Hmmm, benchmarks are failing to find a ground-truth CSV for demo7. Gotta split, but will have a look in the morning.

Other demos as well!
To me this path looks strange: 'benchmarks/csv/DemoS_007-all/benchmarks/csv/DemoS_007-all/TS_MAP.csv'
Should it not be 'benchmarks/csv/DemoS_007-all/TS_MAP.csv' instead?

olejandro · 2024-03-21T13:18:18Z

Made dd_to_csv a second script in pyproject.toml (we'd like to use it from the austimes repo, and couldn't easily otherwise)

@SamRWest could you explain this a bit more? I do not think I understand the motivation / need for the change.

SamRWest · 2024-03-22T00:10:05Z

Made dd_to_csv a second script in pyproject.toml (we'd like to use it from the austimes repo, and couldn't easily otherwise)

@SamRWest could you explain this a bit more? I do not think I understand the motivation / need for the change.

Just that I'd like to be able to convert DD files in other projects to CSV from the commandline:

cd some_other_project
pip install xl2times
dd_to_csv my/dd/files/

The last line is made possible by adding dd_to_csv to the [project.scripts] section in pyproject.toml (here), which requires dd_to_csv.py to be moved into the /xl2times module from /scripts.

SamRWest · 2024-03-22T00:57:03Z

Ok, tests are all passing now, but CI is failing because the runtime has gone up because the pickle cache gets regenerated.

@siddharth-krishna - how can we update the pickle cache with the new files when CI is failing? Disable the runtime check temporarily?

siddharth-krishna

Looks good, thanks! But we'll also need to change the cache path in the github actions yaml, and perhaps also bump the cache key so that it saves a new one. If you don't mind I can push a commit to your branch directly?

@rschuchmann would there be any issues with the Times-Miro app if the cache folder is moved to ~/.cache/xl2times/?

siddharth-krishna · 2024-03-22T06:20:20Z

xl2times/__main__.py

+            pickle.dump(tables, f)
+        logger.info(f"Saved cache for {filename} to {cached_file}")
+
+    return tables


Looks good, thanks!

rschuchmann · 2024-03-22T07:56:56Z

Looks good, thanks! But we'll also need to change the cache path in the github actions yaml, and perhaps also bump the cache key so that it saves a new one. If you don't mind I can push a commit to your branch directly?

@rschuchmann would there be any issues with the Times-Miro app if the cache folder is moved to ~/.cache/xl2times/?

That should be fine. Personally, I would probably use $HOME (and on Windows %HOMEPATH%), but that's up to you.

siddharth-krishna

Now the regression tests pass with main being much slower than this PR branch! This is because now the cache is restoring ~/.cache/xl2times/ but not xl2times/.cache/ so the main code doesn't have access to its cache anymore. :)

@rschuchmann yep, the code internally uses Path.home() which should work on Windows too.

xl2times/transforms.py

olejandro · 2024-03-23T00:53:37Z

@SamRWest I took the liberty to merge you PR, since all the tests are passing now. Hope you don't mind!

SamRWest · 2024-03-24T21:16:24Z

@SamRWest I took the liberty to merge you PR, since all the tests are passing now. Hope you don't mind!

All good @olejandro :) And thanks for sorting out the cache stuff for me @siddharth-krishna!

SamRWest added 4 commits March 14, 2024 13:34

fixed dd_to_csv parsing for lines with multiple spaces in the attribu…

bab423c

…te names. corrected the gams-cat for UC_DYNBND (see #218)

Merge branch 'main' into samw/dd_convert_bugfixes

582d79b

moved dd_to_csv to src/ and added as a script in pyproject.toml

fd01005

moved cache dir to standard ~/.cache/xl2times dir to avoid creating inside .venv subdir

Added cache invalidation for old files

86a1480

SamRWest commented Mar 21, 2024

View reviewed changes

SamRWest added 6 commits March 21, 2024 16:30

Merge branch 'main' into samw/dd_convert_bugfixes

af8aa79

moved dd_to_csv to src/ and added as a script in pyproject.toml

3b8719b

moved cache dir to standard ~/.cache/xl2times dir to avoid creating inside .venv subdir

Added cache invalidation for old files

7bcffc8

reverted changes to data_years()

1615397

Merge remote-tracking branch 'csiro/samw/dd_convert_bugfixes' into sa…

a2bba6e

…mw/dd_convert_bugfixes

Merge remote-tracking branch 'origin/samw/dd_convert_bugfixes' into s…

0262d97

…amw/dd_convert_bugfixes # Conflicts: # xl2times/__main__.py # xl2times/transforms.py

siddharth-krishna reviewed Mar 21, 2024

View reviewed changes

fixed path in benchmarks file

0c767ff

cleaned up caching code as suggested

SamRWest commented Mar 21, 2024

View reviewed changes

SamRWest added 2 commits March 22, 2024 11:14

bugfix for bad CSV paths

e53cef1

Another csv bugfix

5c5c8bf

siddharth-krishna reviewed Mar 22, 2024

View reviewed changes

siddharth-krishna changed the title ~~dd_to_csv bugfixes,~~ dd_to_csv bugfixes, move cache to ~/.cache/xl2times/ Mar 22, 2024

siddharth-krishna added 2 commits March 22, 2024 09:26

Change cache dir in ci.yml, bump cache key

56f333f

Fix docs

bb2a536

siddharth-krishna approved these changes Mar 22, 2024

View reviewed changes

Merge branch 'main' into samw/dd_convert_bugfixes

95ec59f

olejandro reviewed Mar 22, 2024

View reviewed changes

xl2times/transforms.py Show resolved Hide resolved

olejandro approved these changes Mar 22, 2024

View reviewed changes

olejandro marked this pull request as ready for review March 23, 2024 00:52

olejandro merged commit bf5ea29 into main Mar 23, 2024
2 checks passed

olejandro deleted the samw/dd_convert_bugfixes branch March 23, 2024 00:53

This was referenced Mar 24, 2024

Error converting dd to csv #218

Closed

TimesModel.data_years() throws exception for non-int years #217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dd_to_csv bugfixes, move cache to `~/.cache/xl2times/` #230

dd_to_csv bugfixes, move cache to `~/.cache/xl2times/` #230

SamRWest commented Mar 21, 2024 •

edited

Loading

SamRWest Mar 21, 2024

olejandro Mar 21, 2024

SamRWest Mar 21, 2024

SamRWest Mar 21, 2024

olejandro Mar 21, 2024

SamRWest Mar 22, 2024

olejandro Mar 22, 2024

olejandro Mar 22, 2024

SamRWest Mar 22, 2024

SamRWest Mar 21, 2024

siddharth-krishna Mar 21, 2024

SamRWest Mar 21, 2024

siddharth-krishna Mar 22, 2024

SamRWest commented Mar 21, 2024 •

edited

Loading

olejandro commented Mar 21, 2024 •

edited

Loading

olejandro commented Mar 21, 2024

SamRWest commented Mar 22, 2024

SamRWest commented Mar 22, 2024

siddharth-krishna left a comment

siddharth-krishna Mar 22, 2024

rschuchmann commented Mar 22, 2024

siddharth-krishna left a comment

olejandro commented Mar 23, 2024

SamRWest commented Mar 24, 2024

dd_to_csv bugfixes, move cache to ~/.cache/xl2times/ #230

dd_to_csv bugfixes, move cache to ~/.cache/xl2times/ #230

Conversation

SamRWest commented Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamRWest commented Mar 21, 2024 • edited Loading

olejandro commented Mar 21, 2024 • edited Loading

olejandro commented Mar 21, 2024

SamRWest commented Mar 22, 2024

SamRWest commented Mar 22, 2024

siddharth-krishna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rschuchmann commented Mar 22, 2024

siddharth-krishna left a comment

Choose a reason for hiding this comment

olejandro commented Mar 23, 2024

SamRWest commented Mar 24, 2024

dd_to_csv bugfixes, move cache to `~/.cache/xl2times/` #230

dd_to_csv bugfixes, move cache to `~/.cache/xl2times/` #230

SamRWest commented Mar 21, 2024 •

edited

Loading

SamRWest commented Mar 21, 2024 •

edited

Loading

olejandro commented Mar 21, 2024 •

edited

Loading