-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
atproto/data: package for working with generic schema-less atproto data in JSON or CBOR #407
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
16 tasks
bnewbold
added a commit
that referenced
this pull request
Oct 4, 2024
…e data validation (#420) This is currently a branch on top of #407 - [x] parse lexicon schema JSON - [x] load entire directories of schema JSON files from disk as a catalog - [x] check lexicon schema semantics (eg, can't have min greater than max) - [x] validate runtime data (`map[string]any`) against lexicons - [x] whole bunch of corner-case tests - [x] CLI tool for some live-network testing - [x] add support for `tid` and `record-key` lex formats (not in specs yet) - [x] configurable flexible to legacy blobs and lenient datetime parsing (?) - [x] comments and example code probably in a later iteration: - [ ] ensure empty body works (bluesky-social/atproto#2746) - [ ] validate rkey type against lexicon - [ ] CLI tool to validate prod firehose - [ ] CLI tool to validate CAR files - [x] clarify specs around unions: only `object` and `token` types? - [x] clarify specs around `unknown`: only `object` type? - [ ] validate other "primary" lexicon types: subscription, HTTP body, HTTP URL params, etc
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
So far most golang atproto code has either worked with data in a known schema, or entirely ignored the contents. We will at times need to convert to and from JSON and CBOR when a schema isn't available. Or even when the schema is available, a record might have unexpected additional fields, and conversion might need to be done in a way that preserves all fields.
Reused some existing test fixtures to ensure interop and consistent behavior against typescript implementation.
The Bytes, Blob, and CIDLink structs are copied from
lex/util
, and could replace those implementations.Expect to implement
atproto/lexicon
on top of this package to do run-time schema validation."Extract all blobs from unknown data" is a thing that the distributor or appview v2 may need to do in the near future, to copy in to CDN and run image auto-moderation.
I bet there is a much more efficient way to implement the "parsing" functions. The goal with this initial version of the package is to get a reasonable API and test coverage of weird corner-cases. Should be able to optimize the implementation later.