-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lockfile redux #41
base: master
Are you sure you want to change the base?
Lockfile redux #41
Conversation
huh, forgot there were tests for this. Well we can use those to see if the template works. @Zaharid do you think this is approximately what you had in mind? I also updated docs to try and be a bit clearer. |
hmm I was wondering if we could completely hide the lockkey = trim_token(f.__name__) + "_recorded_spec_"
@functools.wraps(f)
def f_(self, *args, **kwargs):
# either returns pre-existing <key>_recorded_spec_ or loaded defaults
try:
_, res = self.parse_from_(None, lockkey, write=False)
except ConfigError as e:
log.debug(e)
res = f(self, *args, **kwargs)
# save to lockfile if not present.
if lockkey not in self.lockfile:
self.lockfile[lockkey] = res
return res
return f_ then you just decorate a function which loads the defaults |
the thing I dislike about that is that if the parse from fails for an obscure reason then that gets hidden. Also I don't think Also should we add an identity parse func to the class called |
So upon initial test this roughly does what we want and I think the user facing interface is fairly intuitive. I still don't like catching the error. I wonder if we could just explictly take the input and check if the recorded spec key is in it? After all we don't want fancy parsing here, we just want to load whatever was dumped so like input_params = self._curr_input
if lockkey in input_params:
# maybe log info that we're loading defaults here?
return input_params[lockkey] |
Just to see I'll need some time to look into this. But @siranipour opinion would be useful here. |
Yep looking into it literally rn, it's looking more intuitive on first glance, just playing with the VP rules implementation |
FWIW I am thinking that one not obvious to me aspect, and possibly the thing that motivated the previous design in the first place the question of how to handle defaults that would apply to a specific dataset (e.g. default cfactors). With this update we would only have the option of writing all the defaults for all the datasets to the lockfile. But maybe that's ok. |
Yeah also one thing that's not clear to me is how we can create new defaults. E.g say in NNPDF5.0 we have a new grouping like standard_report: experiment
theory_covariance_report: nnpdf31_process
NNPDF5_report: NNPDF5_value would we just have to update the sorry to add entropy to the discussion btw |
Another thing is that it doesn't like dumping custom classes. As a a minimal example: from reportengine.compat import yaml
class Foo(yaml.YAMLObject):
yaml_tag = u"Foo"
def __init__(self, a):
self.a = a
def __repr__(self):
return str({"a" : self.a})
if __name__ == '__main__':
bar = Foo(5)
with open("/tmp/foo.yaml", 'w') as stream:
yaml.dump(bar, stream=stream, Dumper=yaml.RoundTripDumper) gives RepresenterError: cannot represent an object: {'a': 5} |
Yes I think the original idea was that we'd only have to dump what we used. I wonder if there is a halfway house where the production rule which was decorated would look like
We'd be locking in the exact structure of the defaults but I think nested mapping is not so crazy. Then the wrapper would check if |
I wouldn't put too much effort into integrating that until this is more stable. But thanks for looking
well the point is that the mapping which your production rule returns must have your new key in it, so you can either append to the file as you say or you could put the default in a new file but make sure the result was added to the returned mapping. So for example you could have (this is stupid but if the default values were more complicated it might make more sense): standard.yaml: experiment theorycov.yaml: nnpdf31_process then something like @record_from_defaults
def produce_default_grouping(self):
loaded_defaults = dict(
standard_report=yaml.safe_load(
read_text(validphys.defaults, "standard.yaml")),
theory_covariance_report=yaml.safe_load(
read_text(validphys.defaults, "theorycov.yaml")),
)
return loaded_defaults which might be relevant for the filter rules because they're quite long so you could load seperate files. In this case I'm not sure a single string is a valid yaml file but imagine they were lists like the filter rules. TL;DR: The only fixed thing here is that the production rule returns a mapping with all defaults in, it doesn't matter how they were loaded. |
I think for now it's safe to assume that defaults are things we would write in a runcard and therefore are not custom objects. |
Furthermore, the issue with your example is with your implementation: https://yaml.readthedocs.io/en/latest/dumpcls.html#dumping-python-classes EDIT: i.e from reportengine.compat import yaml
class Foo(yaml.YAMLObject):
yaml_tag = u"Foo"
def __init__(self, a):
self.a = a
def __repr__(self, a):
return str({"a": self.a})
if __name__ == '__main__':
bar = Foo(5)
_yaml = yaml.YAML()
_yaml.register_class(Foo)
with open("/tmp/foo.yaml", 'w') as stream:
yaml.dump(bar, stream=stream, Dumper=yaml.RoundTripDumper) seems to work just fine |
Do we ever want to do do anything with custom python classes? It seems to me defaults should be simple things that typically live in yaml files in the first place. |
That was my first comment |
It's because the way rules are implemented in VP currently is that it returns a list of custom objects. I guess I can add an intermediary production rule that parses the yaml file and a separate one that returns the rule objects |
That already exists: |
Yeh sure, but I think it'd be best to scrap some of the old lock file machinery |
Ok I really like this redux. Works a lot better than the old one, and I've got a very reasonable lockfile out that saves the filter rules for now. I still get that weird clashing of dataset namespaces:
when I run VP on the lock file, but we had a hacky fix to this last time anyway |
well that's not lockfile machinery, the procedure of turning a list of rules into classes is something else. The lockfile mechanism concerns with the list of rules which are parsed not the end result. |
So one thing that we've been using the lock files system for is to use our own personal cuts for separate projects. We create a new defaults file say |
Minimally I suppose one can introduce that hacky approach, alternatively we can set the parse function inside this wrapper so the hacky fix is hidden from view. There is something sub optimal about the current processing of rules and I think that holding rules in a mapping would make certain operations easier. Then if I have user defined rules which is for some subset of datasets, and the defaults for all datasets then I could have def produce_processed_rules_dataset(self, dataset, default_rules, explicit_rules, theoryid, defaults):
...
if str(dataset) in explicit_rules:
return Rule(
initial_data=explicit_rules[str(dataset)],
defaults=defaults, # these should be renamed considering we will have many defaults
theory_parameters=theory_parameters,
loader=self.loader,
)
else:
return Rule(
initial_data=default_rules[str(dataset)],
defaults=defaults,
theory_parameters=theory_parameters,
loader=self.loader,
) this would mean there wasn't a conflict I think, would simplify extracting the rule for single dataset and also allow you to explicitly override the defaults for a subset of datasets without having to copy and paste all rules. Whilst we're altering this, it's clear that calling something |
Well if personal_project_cuts.yaml is dynamic enough then I'd argue it should have been defined in the runcard using explicit rules system. If it's static enough then you can add it as a proper default and update the available defaults which requires updating the config once. Even if they are relatively dynamic you only need to add to config once and you can modify the personal_project_cuts.yaml and each lockfile along the way will allow you to reproduce old results. |
Sure I guess I'm saying that whilst the defaults can be either a mapping, a list or even a single string whatever it is shouldn't conflict with production rules which the rules currently do, we can introduce a way of working around this but perhaps shouldn't rule out changing the format of the rules. I think that issue is project specific and nothing to do with reportengine though so I'll open an issue in NNPDF/nnpdf |
Yes, I don;t believe the hacky approach is going to cut it. So it seems to me we can either:
On the contrary, I think that was the mistake to an extend the first time around: Seems to me we should match try to simultaneously work on support for filters and ideally dataset defaults to inform the implementation (and certainly have it before we all forget how the thing was supposed to be used...). |
This current system is supposed to be:
My comment on "I wouldn't put too much effort into integrating that until this is more stable" was more because the way in which I take the defaults from the config if possible is possibly part of what is causing the key conflict since I'm using |
I have opened NNPDF/nnpdf#1105 to slightly expand my thoughts on how the cuts could be processed. |
I am starting to think that the current system (as in the one in this PR) has the better ratio of complexity to usefulness. It is slightly inconvenient to get a large blob of settings every time you use a dataset but on the other hand the situations where it matters are mostly theoretical as far as I can tell. Besides we can always (re-)add some of the complexity latter. |
Yeah I think I agree. I was being stupid yesterday and thought there was basically 1 cut for each dataset and 1 cut for each process type but I realise that's not true and that dumping the full list of rules probably works as well as anything in this case. As you say if we start with this we can always expand: I think it's compatible with the concept of later using args as keys in some nested mapping, which I think becomes more desirable with the dataset defaults. Although again maybe we simply don't care about just dumping all the defaults. In terms of the conflict between the production rule and the key in the filter rules I wondered in NNPDF/nnpdf#1105 if the easiest option might be to just change the name of the key? It seems more robust than the hacky workaround and AFAICT the key name in the filter rules is not particularly special - provided the |
…than using try/except
Ok I'm not sure what you think of that but I wasn't a huge fan using |
Hey, the tests of this repo seem to catch actual issues! And I am fine with this approach. |
Yeah! I will fix the tests. Then I don't mind doing the corresponding PR that fixes the data grouping lockfile stuff, I guess we can keep the cuts PR separate but that will also need to be changed and perhaps we should at least start a dataset defaults draft to see how this works with that before merging this. @siranipour is the issue with |
Yep works a treat now!! thanks |
great I will let you try and update NNPDF/nnpdf#818 - remember that you will have to decorate the rule which produces all defaults (both the nnpdf31 ones in the lockfile location and the "current" ones which can probably be called 40 or whatever) and the function which produces the key which is the default set of defaults to use if the user doesn't provide one. I'll let you come up with the names for these 😆 but then schematically you'll have a signature with
Then similar with the things that are currently referred to as |
stupid question but do we actually need a mapping called
EDIT: we can discuss this on the respective PR, just a thought though |
Ok lovely many thanks for this PR Mikey. I think I'll make a fresh PR for a nice clean commit history |
Think the idea was to have the defaults.yaml file in case we wanted to add more global parameters for the cuts in the future |
attempting to simplify lockfiles now that we understand the use case better. This will clearly break any pre-existing lockfile useage and will require a corresponding PR in NNPDF/nnpdf to update any functions which use this machinery.