-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dd_to_csv bugfixes, move cache to ~/.cache/xl2times/
#230
Conversation
…te names. corrected the gams-cat for UC_DYNBND (see #218)
moved cache dir to standard ~/.cache/xl2times dir to avoid creating inside .venv subdir
else: | ||
raise ValueError( | ||
f"Unexpected number of spaces in parameter value setting: {data[index]}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ValueError was getting thrown from veda-produced austimes DD files for a variable with a space in its name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SamRWest when you write "variable" do you mean an index of a GAMS parameter or a GAMS parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, 'variable' is the wrong terminology.
I mean the key
part referred to in this comment:
# Either "value" for a scalar, or "key value" for an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, here's a line from one of our DD files that (i think) would have triggered the ValueError, because of the spaces in UC_non-decreasing EE penetration
:
'UC_non-decreasing EE penetration'.RHS.'QLD'.2015.'EE2_Pri_Pub-b'.ANNUAL 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks! It should be okay not to raise ValueError
, as long as the key is in quotes. Scalar is a numeric, so there should not be spaces in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, that was my assumption.
The new code just splits the line at the last space char, whereas the old code split it at all spaces, then warned if it ended up with >2 tokens, which failed on my example string above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, sometimes there is no value, e.g. from Demo 1:
SET COM_TMAP
/
'REG1'.'DEM'.'TPSCOA'
'REG1'.'NRG'.'COA'
/;
Let's say somebody does this:
SET COM_TMAP
/
'REG1'.'DEM'.'TPS COA'
'REG1'.'NRG'.'COA'
/;
What will happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I can see we are handling this above when we check whether it is a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the new code will catch this with this check, because rfind
returns -1 if no spaces are found in the string.
if split_point == -1:
#if only one word
attributes, value = [], line
# So value is always the last word, or only token | ||
split_point = line.rfind(" ") | ||
if split_point == -1: | ||
# if only one word | ||
attributes, value = [], line | ||
else: | ||
raise ValueError( | ||
f"Unexpected number of spaces in parameter value setting: {data[index]}" | ||
) | ||
attributes, value = line[:split_point], line[split_point + 1 :] | ||
attributes = attributes.split(".") | ||
attributes = [a if " " in a else a.strip("'") for a in attributes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rfind()
works more reliably, on the assumption that there's always a space before the value, so it'll work with 'key with spaces' value
style strings
moved cache dir to standard ~/.cache/xl2times dir to avoid creating inside .venv subdir
…mw/dd_convert_bugfixes
…amw/dd_convert_bugfixes # Conflicts: # xl2times/__main__.py # xl2times/transforms.py
xl2times/__main__.py
Outdated
""" | ||
with open(filename, "rb") as f: | ||
digest = hashlib.file_digest(f, "sha256") # pyright: ignore | ||
hsh = digest.hexdigest() | ||
if os.path.isfile(cache_dir + hsh): | ||
fname1, _timestamp, tables = pickle.load(open(cache_dir + hsh, "rb")) | ||
hash_file = cache_dir / f"{Path(filename).stem}_{hsh}.pkl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the cache filename contains the original filename, then perhaps we no longer need to store a tuple in the pickle? And as discussed we probably don't need to check modified timestamp, since it's super unlikely that files get modified but their hash remains the same..
cleaned up caching code as suggested
pickle.dump(tables, f) | ||
logger.info(f"Saved cache for {filename} to {cached_file}") | ||
|
||
return tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this @siddharth-krishna ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
Hmmm, benchmarks are failing to find a ground-truth CSV for demo7. Gotta split, but will have a look in the morning.
|
Other demos as well! |
@SamRWest could you explain this a bit more? I do not think I understand the motivation / need for the change. |
Just that I'd like to be able to convert DD files in other projects to CSV from the commandline:
The last line is made possible by adding |
Ok, tests are all passing now, but CI is failing because the runtime has gone up because the pickle cache gets regenerated. @siddharth-krishna - how can we update the pickle cache with the new files when CI is failing? Disable the runtime check temporarily? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! But we'll also need to change the cache path in the github actions yaml, and perhaps also bump the cache key so that it saves a new one. If you don't mind I can push a commit to your branch directly?
@rschuchmann would there be any issues with the Times-Miro app if the cache folder is moved to ~/.cache/xl2times/
?
pickle.dump(tables, f) | ||
logger.info(f"Saved cache for {filename} to {cached_file}") | ||
|
||
return tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
~/.cache/xl2times/
That should be fine. Personally, I would probably use $HOME (and on Windows %HOMEPATH%), but that's up to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the regression tests pass with main being much slower than this PR branch! This is because now the cache is restoring ~/.cache/xl2times/
but not xl2times/.cache/
so the main
code doesn't have access to its cache anymore. :)
@rschuchmann yep, the code internally uses Path.home()
which should work on Windows too.
@SamRWest I took the liberty to merge you PR, since all the tests are passing now. Hope you don't mind! |
All good @olejandro :) And thanks for sorting out the cache stuff for me @siddharth-krishna! |
Just a minor bugfix and some housekeeping.