Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear error messages when misconfiguring data #131

Open
lemairecarl opened this issue Nov 28, 2022 · 2 comments
Open

Unclear error messages when misconfiguring data #131

lemairecarl opened this issue Nov 28, 2022 · 2 comments
Assignees

Comments

@lemairecarl
Copy link
Contributor

lemairecarl commented Nov 28, 2022

In my project, the following config works:
runner.py task=segmentation task/model=enet data=radrec data.dataset_path=<DATASET_PATH>

Here is the error I get if I omit the data.* args:
runner.py task=segmentation task/model=enet

omegaconf.errors.MissingMandatoryValue: Missing mandatory value: data._target_
    full_key: data._target_
    object_type=dict

This message does not inform of the real problem, which is that data is not set. I'm thinking it might be better if we make data mandatory. That way, the error message would clearly say that data is missing, instead of one of its keys.

The following is another unclear error message that is related.

I have created a new task in my project by copy-pasting segmentation.yaml. I only changed the _target_. Here is what happens when I omit data.* like above, but with my own task:
runner.py task=radrec task/model=enet

task=radrec task/model=enet
omegaconf.errors.InterpolationKeyError: Interpolation key 'data.image_tag' not found
    full_key: task.image_tag
    object_type=dict

It's weird that I don't get the same message. But also, it seems like making data mandatory would help. (By the way, after setting data and data.dataset_path, it works.)

Another error that I find unclear is when I omit just data.dataset_path; the error does not mention the actual argument name:

hydra.errors.InstantiationException: Error in call to target 'radrec.dataset.datamodule.RadrecDataModule':
TypeError('expected str, bytes or os.PathLike object, not NoneType')
full_key: data

Note that data/radrec.yaml contains dataset_path: null.

Any thoughts?

@nathanpainchaud
Copy link
Member

I'll start by adressing the specific errors (especially where I can explain them), before we take a step back to properly discuss the design of the data config.

I have created a new task in my project by copy-pasting segmentation.yaml. I only changed the _target_. Here is what happens when I omit data.* like above, but with my own task: runner.py task=radrec task/model=enet

task=radrec task/model=enet
omegaconf.errors.InterpolationKeyError: Interpolation key 'data.image_tag' not found
    full_key: task.image_tag
    object_type=dict

It's weird that I don't get the same message. But also, it seems like making data mandatory would help. (By the way, after setting data and data.dataset_path, it works.)

I'm not able to properly explain why the error message changes between the 2 commands. Out of curiosity, did you implement your project inside vital (as its own git branch), or did you create a separate project that depends on vital (in the same vein as my CASTOR project)?

Because the only thing that comes to mind as an explanation would be a difference in the order in which different configs are resolved (maybe across projects hence my previous question), which would cause one error or the other to trigger first.

Another error that I find unclear is when I omit just data.dataset_path; the error does not mention the actual argument name:

hydra.errors.InstantiationException: Error in call to target 'radrec.dataset.datamodule.RadrecDataModule':
TypeError('expected str, bytes or os.PathLike object, not NoneType')
full_key: data

Note that data/radrec.yaml contains dataset_path: null.

I admit that it's not especially intuitive, but in my experience it's how Hydra outputs errors, especially in multirun mode where each individual run is a subprocess (or thread? I'm not sure), which can obfuscate a bit where the error originally came from. The error that is printed last is not especially clear, but if you scroll up the stack trace you'll eventually find the original "native python" error, which ought to be clearer.

In your case, what the error is telling you line-by-line is this:

  1. An error happened when trying to instantiate an config node of target radrec.dataset.datamodule.RadrecDataModule;
  2. The very end of whatever error message was produced by the failed instantiation of the class. In your case, the message makes sense because dataset_path: null is not providing no dataset_path, it's setting None as its value (but a string or path is expected) ;
  3. This is telling you the "name" of the config node whose instantiation failed with the above error. The target provided earlier is the target of this config node.

So, in your case, if you want a clear error message, you should set dataset_path: ??? so that Hydra would not try to instantiate the object with dataset_path=None, but would rather give a clear error message.

@lemairecarl
Copy link
Contributor Author

Merci beaucoup! J'apprécie que tu aies pris le temps de répondre, car ce n'est clairement pas prioritaire.

To answer your question, my projects depends on vital as a git submodule.

Thanks for the suggestion about setting dataset_path: ???, that's what I should have used in the first place.

Now, a question remains unanswered: why isn't data mandatory? Maybe because it some cases the info is contained in a checkpoint that is loaded?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants