Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize output of non-required properties #130

Merged
merged 1 commit into from
Dec 20, 2024

Conversation

414owen
Copy link
Contributor

@414owen 414owen commented Dec 20, 2024

For this schema:

{
  "properties": {
    "a": {
      "const": "a"
    },
    "b": {
      "const": "b"
    },
    "c": {
      "const": "c"
    }
  }
}

We currently produce the following regex (simplified, and line-broken)

\{("a":"a"(,"b":"b")?(,"c":"c")?
 |("a":"a",)?"b":"b"(,"c":"c")?
 |("a":"a",)?("b":"b",)?"c":"c")?\}

This works perfectly well, but contains redundancy. This is seen by the
fact that all three alternatives would match JSON with all three fields.

The different alternatives currently model which field is mandatory.

I propose that we make the alternatives model the choice of last field.
This will produce a regex like this:

\{("a":"a"
 |("a":"a",)?"b":"b"
 |("a":"a",)?("b":"b",)?"c":"c")?\}

This will give us a shorter, but 100% equivalent regex.


Both of the test cases I've changed have been run through this regex equivalence tester

Copy link
Contributor

@torymur torymur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great @414owen and thanks for sharing regex equivalence checker tool too! 🌟

For this schema:

```
{
  "properties": {
    "a": {
      "const": "a"
    },
    "b": {
      "const": "b"
    },
    "c": {
      "const": "c"
    }
  }
}
```

We currently produce the following regex (spacing added around
alternatives for clarity)

```
\{("a":"a"(,"b":"b")?(,"c":"c")?
 |("a":"a",)?"b":"b"(,"c":"c")?
 |("a":"a",)?("b":"b",)?"c":"c")?\}
```

This works perfectly well, but contains redundancy. This is seen by the
fact that all three alternatives would match JSON with all three fields.

The difference between cases at the moment, is which field is mandatory.

I propose that we make the alternatives model the choice of last field.
This will produce a regex like this:

```
\{("a":"a"
 |("a":"a",)?"b":"b"
 |("a":"a",)?("b":"b",)?"c":"c")?\}
```

This will give us a shorter, but 100% equivalent regex.
@torymur torymur force-pushed the os/simpler-json-schema-properties branch from f47c419 to 5613a46 Compare December 20, 2024 14:11
@torymur torymur merged commit a3c300c into dottxt-ai:main Dec 20, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request json schema
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants