-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dagster-aws] add PipesEMRClient
#23998
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @danielgafni and the rest of your teammates on Graphite |
e5f4cb8
to
fe8c586
Compare
Deploy preview for dagster-docs ready! Preview available at https://dagster-docs-6xg33sq92-elementl.vercel.app Direct link to changed pages: |
fe8c586
to
115e545
Compare
115e545
to
c91d896
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we separate the new client from the addition of a base class? The base class is a pretty big decision and we'd want to refactor the other clients.
For example I would want to see what a version looks like that extracts common functionality into functions and we compose, rather than inherit. |
a0f68ca
to
25fb13f
Compare
Yeah sure, I just wanted to know if this was a good idea at all. |
To be clear, I don't want to use this base class across absolutely all pipes clients. It's just for AWS (at least for now). |
Done here: #24042 |
c91d896
to
813f817
Compare
Graphite Automations"docs-beta - Assign Reviewers" took an action on this PR • (08/29/24)2 reviewers were added and 1 label was added to this PR based on Pedram Navid's automation. |
Oh maybe there was a miscommunication here. What I was thinking is that you would add |
d949d88
to
57055af
Compare
4381e86
to
e9b2966
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline comments and needs tests. I think this can be merged independently of the downstack one where you add the key argument to PipesS3MessageReader
, so you should reverse the order of the PRs. PipesS3LogReader
can be included in this one or a separate PR if necessary.
python_modules/libraries/dagster-aws/dagster_aws/pipes/clients/emr.py
Outdated
Show resolved
Hide resolved
57055af
to
b99096f
Compare
e9b2966
to
9f11500
Compare
47b81e1
to
8f170d7
Compare
9f11500
to
75ac46d
Compare
8f170d7
to
fafc04b
Compare
75ac46d
to
14d2bea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To your queue (re-request review when this is ready)
It's the other way: in order to merge this PR we only need The current PR graph is correct. |
I think the comment you're responding to was made before you reordered the PRs. Yes, looks like the current stack order is correct. |
@smackesey I think we can't test anything valuable right now (best we can do is to check Pipes params inkection), because testing setup for EMR is quite complicated and requires real AWS infra. I have some manual tests in this PR. I just had a call with @gibsondan and we agreed on adding infra related tests to the internal repo. I will start working on this later this week. This isn't just EMR's problem, other Pipes clients like Glue, EMR Serverless, ECS, and probably Databricks also require this. For now I suggest we merge this PR without tests. |
fafc04b
to
b9f8fc6
Compare
9ab25dd
to
56fa3fe
Compare
56fa3fe
to
be89f8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please invoke on_launched
then feel free to merge
stale and now approved by @smackesey
cf98371
to
e7bda0b
Compare
e7bda0b
to
2e05446
Compare
Summary & Motivation
This PR adds
PipesEMRCLient
todagster-aws
.It allows running Spark workloads in ephemeral EMR (EC2 flavor) clusters.
There is no support for submitting steps to existing EMR clusters. It turned out to be pretty hard to support properly since there is no native way to distinguish between steps submitted by our client and other workloads running in the same cluster. We can try to implement this in the future.
Tasks:
PipesEMRClient
implementationShowcase:
How I Tested These Changes
Changelog [New | Bug | Docs]
[dagster-aws] new AWS EMR Dagster Pipes client (
dagster_aws.pipes.PipesEMRCLient
) for running and monitoring AWS EMR jobs from Dagster.