Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipes] JsonSchema for externals protocol #16009

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

smackesey
Copy link
Collaborator

@smackesey smackesey commented Aug 22, 2023

Summary & Motivation

Add a script that generates a JSON schema for the externals protocol.

The script uses pydantic and lives in top-level scripts. It writes the json schema to python_modules/dagster-ext/json_schema/{context,message}.json. The script requires pydantic v2 so it must be run through tox -e jsonschema (from dagster-externals) until core is updated.

I wasn't sure how to represent a combined schema for context and message, so I put them in separate schema files.

Also adds a BK step that generates the schema and diffs it against the checked-in version, ensuring nothing has changed.

The schema files are also included in the built dagster-pipes package.

How I Tested These Changes

New unit tests to ensure JSON schema is valid and that context/message objects satisfy it.

@smackesey
Copy link
Collaborator Author

smackesey commented Aug 22, 2023

This stack of pull requests is managed by Graphite. Learn more about stacking.

@smackesey smackesey force-pushed the sean/externals-remove-io-modes branch from d4956a1 to d8c427e Compare August 22, 2023 13:09
Base automatically changed from sean/externals-remove-io-modes to master August 22, 2023 13:26
@smackesey smackesey changed the base branch from master to sean/externals-rename-external August 22, 2023 21:10
@smackesey smackesey force-pushed the sean/externals-rename-external branch 4 times, most recently from d69716b to 2acc4d3 Compare August 22, 2023 21:44
@smackesey smackesey force-pushed the sean/externals-rename-external branch 4 times, most recently from 1d09de7 to 9ad449d Compare August 22, 2023 22:24
@smackesey smackesey force-pushed the sean/externals-rename-external branch 2 times, most recently from 508436f to 6118c00 Compare August 22, 2023 22:40
Base automatically changed from sean/externals-rename-external to master August 23, 2023 01:07
@smackesey smackesey force-pushed the sean/json-schema branch 5 times, most recently from 5aa6da3 to 11d4e91 Compare August 23, 2023 14:18
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to properly review this, I do think we need to see how this json schema would be consumed in other programming language.

E.g. if I'm writing ext in Scala and I want to use the json schema, what does that look like?

Comment on lines 119 to 126
_schema_root = os.path.join(os.path.dirname(__file__), "../json_schema")

CONTEXT_JSON_SCHEMA_PATH = os.path.join(_schema_root, "context.json")
with open(CONTEXT_JSON_SCHEMA_PATH) as f:
CONTEXT_JSON_SCHEMA = json.load(f)
MESSAGE_JSON_SCHEMA_PATH = os.path.join(_schema_root, "message.json")
with open(MESSAGE_JSON_SCHEMA_PATH) as f:
MESSAGE_JSON_SCHEMA = json.load(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synchronously doing I/O on module import very non-ideal. Can we restructure to do on demand and cache?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@smackesey
Copy link
Collaborator Author

if I'm writing ext in Scala and I want to use the json schema, what does that look like?

Depends on what our Scala ext story integration story looks like. If we have a dedicated lib then we would include this schema in that lib. If we don't then we could provide a CLI method to access it. Either way it falls to whatever JSON schema libs are available in Scala to actually perform validation.

Alternatively we could publish the schema to a public URL and just expose that.

@github-actions
Copy link

Deploy preview for dagster-university ready!

✅ Preview
https://dagster-university-41f7eq2x0-elementl.vercel.app
https://sean-json-schema.dagster-university.dagster-docs.io

Built with commit 824b830.
This pull request is being automatically deployed with vercel-action

@schrockn
Copy link
Member

Depends on what our Scala ext story integration story looks like. If we have a dedicated lib then we would include this schema in that lib. If we don't then we could provide a CLI method to access it. Either way it falls to whatever JSON schema libs are available in Scala to actually perform validation.

Right. I don't think we need to commit to this schema right now, and I think it make sense to do this when we actually build our first non-Python integration. So my proposal is that we resurrect this diff when we write our first prototype in another language.

@smackesey smackesey changed the base branch from master to sean/ext-protocol-version September 22, 2023 15:19
@smackesey smackesey force-pushed the sean/ext-protocol-version branch from 04451f6 to fddc445 Compare September 22, 2023 15:23
@smackesey smackesey force-pushed the sean/ext-protocol-version branch from fddc445 to ceceeb3 Compare September 22, 2023 15:26
@smackesey smackesey force-pushed the sean/ext-protocol-version branch from ceceeb3 to 39cf32a Compare September 22, 2023 19:27
@smackesey smackesey force-pushed the sean/ext-protocol-version branch 2 times, most recently from d68530d to a6599e2 Compare September 22, 2023 20:48
Base automatically changed from sean/ext-protocol-version to master September 22, 2023 21:14
@smackesey smackesey force-pushed the sean/json-schema branch 4 times, most recently from 6ffdd09 to 107f826 Compare October 10, 2023 16:03
@@ -0,0 +1,113 @@
import json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this script used for in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One sec processing the PR summary

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I just think we need some more testing.

Comment on lines 257 to 287
def test_message_json_schema_validation():
message = {
PIPES_PROTOCOL_VERSION_FIELD: PIPES_PROTOCOL_VERSION,
"method": "foo",
"params": {"bar": "baz"},
}
jsonschema.validate(message, get_pipes_json_schema("message"))


def test_json_schema_rejects_invalid():
with pytest.raises(jsonschema.ValidationError):
jsonschema.validate({"foo": "bar"}, get_pipes_json_schema("context"))
with pytest.raises(jsonschema.ValidationError):
jsonschema.validate({"foo": "bar"}, get_pipes_json_schema("message"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more tests here. We should be building a payload using the APIs used by the framework and validating them against the json schema.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test to explicitly validate the output of build_external_execution_context_data. This is also already tested a bunch of times above in any test that uses the helper _make_external_execution_context, which creates various flavors of PipesContextData (e.g. with and without partition key).

In the external -> orchestration direction, the existing protocol only defines the high level PipesMessage rather than all the individual messages, and that is tested in test_message_json_schema_validation.

class PipesMessage(TypedDict):
    """A message sent from the external process to the orchestration process."""

    __dagster_pipes_version: str
    method: str
    params: Optional[Mapping[str, Any]]

@smackesey smackesey changed the title [ext] JsonSchema for externals protocol [pipes] JsonSchema for externals protocol Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants