Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atproto/data: package for working with generic schema-less atproto data in JSON or CBOR #407

Merged
merged 18 commits into from
Jan 2, 2024

Conversation

bnewbold
Copy link
Collaborator

@bnewbold bnewbold commented Oct 31, 2023

So far most golang atproto code has either worked with data in a known schema, or entirely ignored the contents. We will at times need to convert to and from JSON and CBOR when a schema isn't available. Or even when the schema is available, a record might have unexpected additional fields, and conversion might need to be done in a way that preserves all fields.

Reused some existing test fixtures to ensure interop and consistent behavior against typescript implementation.

The Bytes, Blob, and CIDLink structs are copied from lex/util, and could replace those implementations.

Expect to implement atproto/lexicon on top of this package to do run-time schema validation.

"Extract all blobs from unknown data" is a thing that the distributor or appview v2 may need to do in the near future, to copy in to CDN and run image auto-moderation.

I bet there is a much more efficient way to implement the "parsing" functions. The goal with this initial version of the package is to get a reasonable API and test coverage of weird corner-cases. Should be able to optimize the implementation later.

@bnewbold bnewbold requested a review from ericvolp12 October 31, 2023 06:13
@bnewbold bnewbold self-assigned this Dec 25, 2023
@bnewbold bnewbold merged commit e500a62 into main Jan 2, 2024
6 checks passed
@bnewbold bnewbold deleted the bnewbold/sdk-data branch January 2, 2024 23:24
bnewbold added a commit that referenced this pull request Oct 4, 2024
…e data validation (#420)

This is currently a branch on top of
#407

- [x] parse lexicon schema JSON
- [x] load entire directories of schema JSON files from disk as a
catalog
- [x] check lexicon schema semantics (eg, can't have min greater than
max)
- [x] validate runtime data (`map[string]any`) against lexicons
- [x] whole bunch of corner-case tests
- [x] CLI tool for some live-network testing
- [x] add support for `tid` and `record-key` lex formats (not in specs
yet)
- [x] configurable flexible to legacy blobs and lenient datetime parsing
(?)
- [x] comments and example code

probably in a later iteration:

- [ ] ensure empty body works
(bluesky-social/atproto#2746)
- [ ] validate rkey type against lexicon
- [ ] CLI tool to validate prod firehose
- [ ] CLI tool to validate CAR files
- [x] clarify specs around unions: only `object` and `token` types?
- [x] clarify specs around `unknown`: only `object` type?
- [ ] validate other "primary" lexicon types: subscription, HTTP body,
HTTP URL params, etc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant