Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to pass unknown in parse calls #514

Closed
wants to merge 1 commit into from

Conversation

sirosen
Copy link
Collaborator

@sirosen sirosen commented May 13, 2020

This is half of #507 . For the other half, I think we need to wait for 7.0, so we can make a breaking change to the default behavior.

This adds support for passing the unknown parameter in two major locations: Parser instantiation, and Parser.parse calls. use_args and use_kwargs are just parse wrappers, and they need to pass it through as well.

It also adds support for a class-level default for unknown, Parser.DEFAULT_UNKNOWN, which sets unknown for any future parser instances.

Explicit tweaks to handle this were necessary in asyncparser and PyramidParser, due to odd method signatures.

Support is tested in the core tests, but not the various framework tests. (I wanted to add this, but couldn't think of an easy/good way to do it. Suggestions welcome.)

Add a 6.2.0 (Unreleased) changelog entry with detail on this change. The changelog states that we will change the DEFAULT_UNKNOWN default in a future major release. Presumably we'll make it EXCLUDE, but I'd like to make it location-dependent if feasible, so I didn't commit to anything in the phrasing.


This obsoletes #506 if merged. I like to incorporate people's work if possible, but I don't see a way in this case. I don't think we should make this behavior special for dict2schema. I'd rather add a general feature which happens to play nice with dict2schema. That way, you get fewer chances for surprising behavioral changes if you replace a dict with a full schema object.

My first attempt at this was much more complex and didn't come together well.
I wanted to have a mapping like

{
  "query": EXCLUDE,
  "json": RAISE,
  ...
}

but putting this together and making it user-settable was just a lot of work. I'd rather do this, which is simpler, and get it into people's hands if they're having issues moving to 6.x.

@sirosen sirosen requested a review from lafrech May 13, 2020 22:47
Copy link
Member

@lafrech lafrech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation is perfect.

Thanks for this.

Yes, testing this in all frameworks looks cumbersome and I don't think it adds much more coverage once the principle is tested in core tests.

As I wrote in #507, ideally, I'd like to set EXCLUDE as default and override to RAISE for body schemas, but we can't do that (unless perhaps we inspect the Meta attribute of the schema and I'd rather not do that). My use case should be achievable if we add a location to unknown mapping. This mapping could be a class attribute:

    #: Default value to use for 'unknown' on schema load
    DEFAULT_UNKNOWN = None
    #: Default values to use for 'unknown' on schema load depending on location (overrides DEFAULT_UNKNOWN)
    DEFAULT_UNKNOWN_BY_LOCATION = {}

Then in user code:

    DEFAULT_UNKNOWN = ma.EXCLUDE
    DEFAULT_UNKNOWN_BY_LOCATION = {'json': ma.RAISE}

User wouldn't need to define all locations. Only those that differ from default.

        self.unknown = unknown or self.DEFAULT_UNKNOWN_BY_LOCATION.get(self.location, self.DEFAULT_UNKNOWN)

This can even be done in a minor version.


Next major may override defaults to

    DEFAULT_UNKNOWN = ma.EXCLUDE
    DEFAULT_UNKNOWN_BY_LOCATION = {}

CHANGELOG.rst Outdated Show resolved Hide resolved
This adds support for passing the `unknown` parameter in two major
locations: Parser instantiation, and Parser.parse calls.
use_args and use_kwargs are just parse wrappers, and they need to pass
it through as well.

It also adds support for a class-level default for unknown,
`Parser.DEFAULT_UNKNOWN`, which sets `unknown` for any future parser
instances.

Explicit tweaks to handle this were necessary in asyncparser and
PyramidParser, due to odd method signatures.

Support is tested in the core tests, but not the various framework
tests.

Add a 6.2.0 (Unreleased) changelog entry with detail on this change.
The changelog states that we will change the DEFAULT_UNKNOWN default
in a future major release. Presumably we'll make it `EXCLUDE`, but I'd
like to make it location-dependent if feasible, so I didn't commit to
anything in the phrasing.
@lafrech
Copy link
Member

lafrech commented May 15, 2020

We should add a note in the docs. It can be after the "location" section of the quickstart: https://webargs.readthedocs.io/en/latest/quickstart.html#request-locations.

@lafrech
Copy link
Member

lafrech commented May 15, 2020

Things might change in marshmallow regarding propagation but currently, unknown doesn't propagate. There's a PR to propagate unknown when passed as load kwarg, but not as Schema param (marshmallow-code/marshmallow#1575). I'm not sure this PR will make it because it is not consistent (load != schema init).

Let's assume unknown does not propagate. Passing it only makes sense for schemas without nesting. This is generally the case for headers and query args. But not for body args.

My concern is that we might be relying on something that will work in most cases but fail sometimes:

  • If people use nesting in headers, query (achievable with custom logic),...
  • If people declare body schemas and rely on this new feature to make them exclude.

Shouldn't we settle on something in marshmallow before making a decision here?

I do realize this PR fixes the from_dict case and I like the fact the it works for any schema (from_dict or classic Schema). Except it does not really work for Schema since nesting is broken.

@sirosen
Copy link
Collaborator Author

sirosen commented May 15, 2020

First up: I'll add docs to this PR later, hopefully today.

Shouldn't we settle on something in marshmallow before making a decision here?

I'm not convinced that we should wait. Maybe we want marshmallow settled before working on something more significant, like LOCATION_DEFAULT_UNKNOWN. But right now, I'm thinking about users who want to write

@use_args(
    {"myheader": fields.Str(data_key="X-My-Header")},
    location="headers",
    unknown=ma.EXCLUDE
)

webargs isn't used in this case for marshmallow features, but merely to extract data from a location like headers or query params.

I've said before that users can just write a schema class, but seeing people's usage more is making me rethink that. Supporting unknown with dict2schema is valuable because it removes some of the temptation to write views like

@use_args(SomeDataSchema, location="json")
def foo(request, args):
    myheader = request.headers.get("X-My-Header")
    ...

where basic parsing is mixed both inside and outside of the view code.


On the other hand, we have users who will want to write

@use_args(SomeSchema, unknown=ma.EXCLUDE)

and will be surprised by the results (not propagating to nested schemas correctly).

However, that's not really "webargs' problem" to solve. The same thing happens with

@use_args(SomeSchema(unknown=ma.EXCLUDE))

already. I don't think by exposing an additional way to pass unknown around that we're making the situation worse.


I can see another solution to all of this. But I might have to gear up and do (my first!) marshmallow PR to make it happen.

If marshmallow is not going to propagate unknown by default, but adds a new parameter next to unknown to control it, e.g. propagate_unknown=True/False, then we should wait. Because I think we would want parse(..., unknown=INCLUDE) to translate down into schema.load(..., unknown=INCLUDE, propagate_unknown=True).

I'll post on that PR https://github.com/marshmallow-code/marshmallow/issues/1490 to suggest this. Maybe I can manage the PR this weekend.

@sirosen
Copy link
Collaborator Author

sirosen commented Aug 11, 2020

I think I'd like to revisit merging this in its current state and doing a release.

Shouldn't we settle on something in marshmallow before making a decision here?

I think marshmallow-code/marshmallow#1634 is probably the best approach because it's backwards compatible and explicit. Maybe that's a bit egotistical, but hey, I wouldn't have submitted the PR if I didn't think it was a good idea! 😄

But even if that PR were taken, I'm not sure how it impacts the behavior here. Would we really want to pass propagate_unknown=True by default? It feels a little bit like we're doing a bit too much if we do so.

But the biggest argument against it: it would mean that webargs behavior varies between different versions of marshmallow 3.
I don't like the idea of having behavior change subtly between marshmallow 3.7 and 3.8 if we can avoid it.

Should I consider going ahead and merging this as-is? And when/if marshmallow 3 changes behavior, we can support that too, in the same way?

@sirosen
Copy link
Collaborator Author

sirosen commented Sep 11, 2020

This is already included in the 7.0 work, so I'm closing this as a duplicate.

I don't think marshmallow is going to change this behavior, and I think we've accepted that. But taking care of this feature in a major version bump lets us solve the problem better and more thoroughly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants