-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pipes] JsonSchema for externals protocol #16009
base: master
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
d4956a1
to
d8c427e
Compare
33dd060
to
13df99e
Compare
d69716b
to
2acc4d3
Compare
13df99e
to
13aaa00
Compare
1d09de7
to
9ad449d
Compare
13aaa00
to
7d38f30
Compare
508436f
to
6118c00
Compare
5aa6da3
to
11d4e91
Compare
c2ad1cf
to
d4716b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to properly review this, I do think we need to see how this json schema would be consumed in other programming language.
E.g. if I'm writing ext in Scala and I want to use the json schema, what does that look like?
_schema_root = os.path.join(os.path.dirname(__file__), "../json_schema") | ||
|
||
CONTEXT_JSON_SCHEMA_PATH = os.path.join(_schema_root, "context.json") | ||
with open(CONTEXT_JSON_SCHEMA_PATH) as f: | ||
CONTEXT_JSON_SCHEMA = json.load(f) | ||
MESSAGE_JSON_SCHEMA_PATH = os.path.join(_schema_root, "message.json") | ||
with open(MESSAGE_JSON_SCHEMA_PATH) as f: | ||
MESSAGE_JSON_SCHEMA = json.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synchronously doing I/O on module import very non-ideal. Can we restructure to do on demand and cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
d4716b4
to
8aa6b92
Compare
Depends on what our Scala ext story integration story looks like. If we have a dedicated lib then we would include this schema in that lib. If we don't then we could provide a CLI method to access it. Either way it falls to whatever JSON schema libs are available in Scala to actually perform validation. Alternatively we could publish the schema to a public URL and just expose that. |
8aa6b92
to
824b830
Compare
Deploy preview for dagster-university ready! ✅ Preview Built with commit 824b830. |
Right. I don't think we need to commit to this schema right now, and I think it make sense to do this when we actually build our first non-Python integration. So my proposal is that we resurrect this diff when we write our first prototype in another language. |
824b830
to
51ce3fb
Compare
04451f6
to
fddc445
Compare
51ce3fb
to
04b0ebe
Compare
fddc445
to
ceceeb3
Compare
04b0ebe
to
f6eab42
Compare
ceceeb3
to
39cf32a
Compare
f6eab42
to
c8f28de
Compare
d68530d
to
a6599e2
Compare
6ffdd09
to
107f826
Compare
@@ -0,0 +1,113 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this script used for in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One sec processing the PR summary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I just think we need some more testing.
def test_message_json_schema_validation(): | ||
message = { | ||
PIPES_PROTOCOL_VERSION_FIELD: PIPES_PROTOCOL_VERSION, | ||
"method": "foo", | ||
"params": {"bar": "baz"}, | ||
} | ||
jsonschema.validate(message, get_pipes_json_schema("message")) | ||
|
||
|
||
def test_json_schema_rejects_invalid(): | ||
with pytest.raises(jsonschema.ValidationError): | ||
jsonschema.validate({"foo": "bar"}, get_pipes_json_schema("context")) | ||
with pytest.raises(jsonschema.ValidationError): | ||
jsonschema.validate({"foo": "bar"}, get_pipes_json_schema("message")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need more tests here. We should be building a payload using the APIs used by the framework and validating them against the json schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test to explicitly validate the output of build_external_execution_context_data
. This is also already tested a bunch of times above in any test that uses the helper _make_external_execution_context
, which creates various flavors of PipesContextData
(e.g. with and without partition key).
In the external -> orchestration direction, the existing protocol only defines the high level PipesMessage
rather than all the individual messages, and that is tested in test_message_json_schema_validation
.
class PipesMessage(TypedDict):
"""A message sent from the external process to the orchestration process."""
__dagster_pipes_version: str
method: str
params: Optional[Mapping[str, Any]]
107f826
to
a7fc19d
Compare
Summary & Motivation
Add a script that generates a JSON schema for the externals protocol.
The script uses pydantic and lives in top-level
scripts
. It writes the json schema topython_modules/dagster-ext/json_schema/{context,message}.json
. The script requires pydantic v2 so it must be run throughtox -e jsonschema
(fromdagster-externals
) until core is updated.I wasn't sure how to represent a combined schema for context and message, so I put them in separate schema files.
Also adds a BK step that generates the schema and diffs it against the checked-in version, ensuring nothing has changed.
The schema files are also included in the built
dagster-pipes
package.How I Tested These Changes
New unit tests to ensure JSON schema is valid and that context/message objects satisfy it.