You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
JSON schema nodes without the type keyword are valid, but I think we currently misinterpret the semantics.
We compile
{
"properties": {
"a": {
"const": "a"
}
}
}
to
\{("a":"a")?\}
however according to this thread, numbers, strings, and other JSON types are also valid here. properties doesn't make a restriction on the type of node, but if the node is an object, then its fields should be validated according to the properties field of the schema.
Frankly, the thread is a little confusing, and the JSON schema spec seems to itself be ambiguous, but there does seem to be consensus, among those concerned, and across validators.
I want to highlight a couple of quirks/features of the JSON schema spec, starting from the fact that JSON schema keywords add constraints, so it's possible to write a schema like this:
either the pattern constraint, or the min/max length constraints, but not both, and only in the presence of the "type" field, which isn't required for these constraints to be valid.
In reality, this JSON schema represents the union of all four keyword constraints -- "type", "minLength", "maxLength", and "pattern".
Now, there is no JSON which matches all of these constraints, because "hello world" is longer than 10 chars. Ideally, the way we'd model this is with a regex which can never succeed, such as (?!x)x, although since the aim is really to power LLM inference, I guess throwing an error at schema compile time would be more apt....
In any case, it would be nice to support more combinations of constraints (eg. minLength + pattern), as well as having less dependencies between constraints (eg. maxLength without type).
Obviously we'll never be able to support all of the JSON schema spec (eg. the multipleOf constraint), but maybe we can support the union of multiple constraints on the syntactic level, with lookahead groups (of which we can write multiple)?
As an example, the constraints for the above schema could be represented as:
(?|helloworld)"[^"]{5,10}"
It seems like a pretty big design space to explore.
JSON schema nodes without the
type
keyword are valid, but I think we currently misinterpret the semantics.We compile
to
however according to this thread, numbers, strings, and other JSON types are also valid here.
properties
doesn't make a restriction on the type of node, but if the node is an object, then its fields should be validated according to theproperties
field of the schema.Frankly, the thread is a little confusing, and the JSON schema spec seems to itself be ambiguous, but there does seem to be consensus, among those concerned, and across validators.
@torymur @dpsimpson
The text was updated successfully, but these errors were encountered: