Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SchemaBuilder incorrecly marking field as required for copied schema #78

Open
rsteiger opened this issue Jul 21, 2024 · 4 comments
Open

Comments

@rsteiger
Copy link

Here's a reproduction of the issue:

builder_1 = genson.SchemaBuilder()
builder_1.add_object({"a": 0})
builder_1.add_object({"b": 0})

# Make a copy of builder_1.
builder_2 = genson.SchemaBuilder()
builder_2.add_schema(builder_1)

# This passes.
assert builder_1.to_schema() == builder_2.to_schema()

another_schema = {
    '$schema': 'http://json-schema.org/schema#',
    'properties': dict(c={'type': 'integer'}),
    'required': ['c'],
    'type': 'object'
}
builder_1.add_schema(another_schema)
builder_2.add_schema(another_schema)

# This fails, builder_2 marks 'c' as required.
assert builder_1.to_schema() == builder_2.to_schema()
@wolverdude
Copy link
Owner

Issue #25 seems to be rearing its ugly head again. You can find my explanation of the basic reason for why this is happening here. I apparently failed to add docs the README as I had said I would, but that said, I thought I had fixed the problem. Let me look a little deeper here.

@wolverdude
Copy link
Owner

Okay, so it looks like I fixed the problem that was specifically described in issue #25, but while yours is the same underlying problem, it wasn't fixed by that particular solution. Yay for treating symptoms, not causes! Unfortunately, I think we're stuck with this particular cause, so I'll have to stay focused on the symptoms.

What's happening is that GenSON implicitly converts a SchemaBuilder into a dict schema if it gets passed to add_schema(). This is logically simple, but it means that any information not serialized by to_schema() gets lost in the transition. This is actually good, because it forces GenSON not to stay very close to the bare JSON-Schema spec, and thus (hopefully) be more intuitive. But for the arcane reasons outlined in issue #25, the presence of an empty required field is a nonintuitive thing that gets lost.

The fix for issue #25 solved this by ensuring output schemas would always contain the required field if any input schema contained it, even if empty. But that doesn't work in this case because the input schema (builder_1) is itself a SchemaBuilder, and when serialized it "helpfully" leaves out the empty required field because no one passed it an input schema that directed it to do otherwise.

@wolverdude
Copy link
Owner

wolverdude commented Jul 21, 2024

One way around this is just to use deepcopy to create an actual recursive duplicate of the builder object. But that will only work for fresh objects, not for adding a schema to another schema that already has its own data. In theory, SchemaBuilder itself could do something like this instead of the implicit dict conversion when another SchemaBuilder object is passed to add_schema(), but that would require adding an extra internal-only API all the way down, and I would rather not do that.

Outside of that, I unfortunately don't think this is fixable without seriously messing up the seed schema functionality. If this is a serious problem for your use-case, and you can do without seed schemas, I would suggest creating a custom object SchemaStrategy that inherits from the canonical one and sets this ivar to True instead of False. That will tell it to always include the required key in the output, so then the info that no keys are required won't be lost between builder_1 and builder_2.

@rsteiger
Copy link
Author

Thank you for investigating. In my case I have a distributed workload where I'm loading a shared schema, adding objects, then updating the shared schema. Now, within a worker, instead of passing a schema around I am passing a builder around and only serializing it when updating the shared schema. It would be nice to not rely on this behavior, but it sounds like a difficult fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants