Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Default to unknown=EXCLUDE? #507

Closed
lafrech opened this issue Apr 29, 2020 · 10 comments
Closed

[RFC] Default to unknown=EXCLUDE? #507

lafrech opened this issue Apr 29, 2020 · 10 comments

Comments

@lafrech
Copy link
Member

lafrech commented Apr 29, 2020

Question raised in #480.

unknown=RAISE may be a nice default in marshmallow but it can be cumbersome to change in webargs.

In practice it is mostly useful in body payload.

I wouldn't use it in query args, for instance, because by construction of the framewok, there may be several layers of decorators adding different fields to parse.

In flask-smorest, I added a doc section to explain it but it is still surprising for the user (see marshmallow-code/flask-smorest#145).

While the base schema is the way to go in marshmallow and is convenient once you are familiar with this logic, it is not practical to with arguments defined with dict2schema.

I guess in practice, dict2schema is used mostly for arguments with a few fields (typically query args) while the schema form is adapted to huge schemas (typically json body). Using EXCLUDE as default would make the dict2field form work while the user could easily opt-in to RAISE when using the schema form.

Should we change the default to EXCLUDE in a major release and let the user opt-in to RAISE?

@lafrech
Copy link
Member Author

lafrech commented Apr 29, 2020

I'm realizing that my question above is based on the wrong assumption that the unknown parameter of the schema is set in webargs, whish is incorrect. When using a schema, webargs does not affect this parameter.

The question in #480 was about changing the default for dict2schema only.

I don't see a nice way to change the default in marshmallow schemas. Changing it inside user-provided schema sounds bad. Providing a base schema with EXCLUDE as default would not be of much help: it wouldn't be less cumbersome than defining one's own schema.

But having the default in dict2schema differ from Schema sounds bad, doesn't it?

@zenglanmu
Copy link

at least there should be an option to set EXCLUDE when using dict_arg not user defined schema class. User defined schema classes is fine because unknown=EXCLUDE can be easily passed through class Meta. We can do some thing like:
dict2schema(dct, schema_class)(unknown=unknown)
where parameter unknown can be passed by user

@lafrech
Copy link
Member Author

lafrech commented Apr 29, 2020

That's the point of #480 (comment).

I wouldn't object to that.

Well, the thing is dict2schema is imported from marshmallow 3, which adds inertia. I'm not sure we want to change it there, or if we do, it might be part of a larger change because if we add unknown, why not add other parameters as well?

@zenglanmu
Copy link

I think the problem comes becuase dict object and schema class are basically two different things. I agreed with your idea that if unknown is added, what about other parameters.
Currently we can solve this problem temporary by using syntax such as:
@use_kwargs(ma.Schema.from_dict(some_args)(unknown=EXCLUDE))
or maybe we can put unknown together with other parameters into arg_dicts. So somthing like
dict_args= { 'field1' : fields.Str(), 'field2' : fields.Int(), 'unknown': EXCLUDE, 'strict': True }

that any attrs in dict_args is not an instance of marshmallow.fields, then treated it as **kwargs passed to Schema()(**kwargs)

@sirosen
Copy link
Collaborator

sirosen commented Apr 29, 2020

I think the simplest solution is to just add unknown=None (as the default) to Parser.parse, and then pass any non-None value when we do the schema load.

result = schema.load(location_data)

becomes

load_kwargs = {}
if unknown is not None:
    load_kwargs["unknown"] = unknown
result = schema.load(location_data, **load_kwargs)

With no change in defaults for dict2schema.

That solves it because you can specify unknown and we don't have to worry about what kind of schema was used. It's only important for dict2schema, but it would work fine for full schemas as well.

If webargs itself adds a default for unknown in dict2schema built-in, that's fine -- we could default to EXCLUDE and I'd be okay with that -- but I think we then need to be able to specify non-default values somehow. I think the above might be the best way to do it because it's easy on us (to maintain) and provides a pretty good experience for developers.


I keep on thinking up "clever" ideas, like having a location_to_default_unknown_value_map to tell us headers => EXCLUDE, body => RAISE, .... But then we run into trouble trying to make a nice interface for applying any fancy logic like that. I'd rather just skip it and make dict2schema always do the same thing and let users pass unknown in their parse and use_args calls.

@lafrech
Copy link
Member Author

lafrech commented Apr 29, 2020

I like the fact that it acts on both schema and dict.

It might solve the bad UX of having RAISE as default while most of the time EXCLUDE is better (totally arguable) if the user can define another default value. As a matter of fact, he can by overriding parse in a subclass, but we could add a DEFAULT_UNKNOWN class attribute. We could be tempted to change the default to EXCLUDE in a major release if we get feedback in favour of this but using this for a default value only allows to override it in the use_args call, as the default passed here will override Schema unknown, so I don't think it is a good idea after all.

I'd rather have a custom default value (EXCLUDE) at framework level that I would override in Schema (RAISE in Schemas used in body), and see nothing in my API resources. With your solution, I can leave Schemas to default RAISE and override to EXCLUDE in each use_args query args call, but that's cumbersome and I'd rather stick to overriding unknown in the Schema (I don't use dict2schema) overall.

I guess I'm not really concerned by this change since I only use Schemas and the meta attribute is well suited for my case.

And therefore the good points I pointed if not this whole comment are not that relevant...


What about propagation? Currently, unknown does not propagate. It may change in the near future.

marshmallow-code/marshmallow#1575

I guess propagation affects mostly body payloads for which people mostly use Schemas.

@sirosen
Copy link
Collaborator

sirosen commented Apr 30, 2020

or maybe we can put unknown together with other parameters into arg_dicts ... any attrs in dict_args is not an instance of marshmallow.fields, then treated it as **kwargs passed to Schema()(**kwargs)

I don't really like this solution because it's mixing things together in a way that I believe is likely to be a source of confusion for users. It also poses a problem when, for example, "unknown" is a field name.

We have to draw a line somewhere on how many features will be packed into dict2schema usage. unknown might be a special case because the marshmallow default (RAISE) is not the sensible default for several webargs data locations (headers for sure, and it's arguable for query -- I like RAISE behavior there, but am probably an outlier).

It might solve the bad UX of having RAISE as default while most of the time EXCLUDE is better (totally arguable) if the user can define another default value.

Yeah, many people will want EXCLUDE as the default. I want to have the freedom to set it to RAISE, but I don't want to force my preference on all users.

I agree with your point about having to specify unknown=EXCLUDE on every call being an annoyance. If I were forced to do that, I'd probably wrap use_args in a custom decorator just to pass that! Which is maybe not a sign of a great UX. 😂

So what about adding both of these things?

  • Parser.DEFAULT_UNKNOWN = EXCLUDE, which can then be set on class instances and
  • unknown=... in parse and use_args

with logic like schema.load(location_data, unknown=unknown or self.DEFAULT_UNKNOWN)?

We could add DEFAULT_UNKNOWN = None in 6.1.0 to be backwards compatible, and make DEFAULT_UNKNOWN = EXCLUDE the plan for 7.0.

@sirosen
Copy link
Collaborator

sirosen commented Apr 30, 2020

Oh, I forgot to answer about propagation. I'm not sure what, if anything, we would do differently to handle it in webargs? It might fall on the other side of the "if you want that, use a Schema class" divide.

The way that marshmallow PR is written, it looks like the proposal would be to propagate the unknown value passed by webargs. I think that's fine, especially if we allow Parser.DEFAULT_UNKNOWN = None (or similar) to specify that no value should be passed.

@zenglanmu
Copy link

zenglanmu commented Apr 30, 2020

I'm fine with whatever DEFAULT_UNKNOWN . But i would like able to globally set the unknown behavior. Revised every dict_arg across multiple slocs in order to keep compatible with latest webarg release is nothing fun.
As for if other schema opts were needed when dealing dict_arg, how about allowing user to define something like __schema_opts__ attribute, then webargs can create class Meta from this attribute when create schema class.

@sirosen
Copy link
Collaborator

sirosen commented Sep 11, 2020

This is done in dev and will be part of the 7.0.0 release. It's available in the current beta.

For anyone reading this thread: we went with a per-location approach after all.
Parsers have DEFAULT_UNKNOWN_BY_LOCATION with per-location defaults. You can read about it in the changelog or this new section of the advanced usage docs.

@sirosen sirosen closed this as completed Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants