Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-aws] add PipesEMRClient #23998

Merged
merged 1 commit into from
Oct 14, 2024
Merged

Conversation

danielgafni
Copy link
Contributor

@danielgafni danielgafni commented Aug 28, 2024

Summary & Motivation

This PR adds PipesEMRCLient to dagster-aws.

It allows running Spark workloads in ephemeral EMR (EC2 flavor) clusters.

There is no support for submitting steps to existing EMR clusters. It turned out to be pretty hard to support properly since there is no native way to distinguish between steps submitted by our client and other workloads running in the same cluster. We can try to implement this in the future.

Tasks:

  • PipesEMRClient implementation
  • local testing

Showcase:

image.png

How I Tested These Changes

Changelog [New | Bug | Docs]

[dagster-aws] new AWS EMR Dagster Pipes client (dagster_aws.pipes.PipesEMRCLient ) for running and monitoring AWS EMR jobs from Dagster.

Copy link
Contributor Author

danielgafni commented Aug 28, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @danielgafni and the rest of your teammates on Graphite Graphite

@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch from e5f4cb8 to fe8c586 Compare August 28, 2024 11:36
@danielgafni danielgafni changed the title add PipesEMRClient [dagster-aws] add PipesEMRClient Aug 28, 2024
@danielgafni danielgafni requested a review from schrockn August 28, 2024 11:37
Copy link

github-actions bot commented Aug 28, 2024

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-6xg33sq92-elementl.vercel.app
https://08-27-add-pipesemrclient.dagster.dagster-docs.io

Direct link to changed pages:

@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch from fe8c586 to 115e545 Compare August 28, 2024 11:45
@danielgafni danielgafni marked this pull request as ready for review August 28, 2024 11:46
@graphite-app graphite-app bot added the area: docs Related to documentation in general label Aug 28, 2024
@graphite-app graphite-app bot requested a review from erinkcochran87 August 28, 2024 11:47
@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch from 115e545 to c91d896 Compare August 28, 2024 11:50
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we separate the new client from the addition of a base class? The base class is a pretty big decision and we'd want to refactor the other clients.

@schrockn
Copy link
Member

For example I would want to see what a version looks like that extracts common functionality into functions and we compose, rather than inherit.

@danielgafni danielgafni force-pushed the 08-09-_dagster-aws_add_ecs_pipes branch 2 times, most recently from a0f68ca to 25fb13f Compare August 28, 2024 16:15
@danielgafni
Copy link
Contributor Author

Yeah sure, I just wanted to know if this was a good idea at all.
I'll make a separate PR if you think it's worth trying out.

@danielgafni danielgafni changed the base branch from 08-09-_dagster-aws_add_ecs_pipes to graphite-base/23998 August 28, 2024 17:33
@danielgafni
Copy link
Contributor Author

To be clear, I don't want to use this base class across absolutely all pipes clients. It's just for AWS (at least for now).

@danielgafni
Copy link
Contributor Author

Done here: #24042

@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch from c91d896 to 813f817 Compare August 29, 2024 07:53
@graphite-app graphite-app bot requested review from cmpadden and PedramNavid August 29, 2024 07:54
@danielgafni danielgafni changed the base branch from graphite-base/23998 to master August 29, 2024 07:54
@danielgafni danielgafni changed the base branch from master to add-BaseAWSPipesClient August 29, 2024 07:54
Copy link

graphite-app bot commented Aug 29, 2024

Graphite Automations

"docs-beta - Assign Reviewers" took an action on this PR • (08/29/24)

2 reviewers were added and 1 label was added to this PR based on Pedram Navid's automation.

@schrockn
Copy link
Member

Oh maybe there was a miscommunication here. What I was thinking is that you would add PipesEMRClient without the base class so that we get the integration done, but then we can refactor it to share code with the other clients. I think the refactor should be down with at minimum 2 examples applied so we aren't overfitting it to a particular use case.

@danielgafni danielgafni force-pushed the 09-30-_dagster-pipes_add_key_param_to_pipess3messagereader branch 2 times, most recently from d949d88 to 57055af Compare October 10, 2024 14:44
@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch from 4381e86 to e9b2966 Compare October 10, 2024 14:44
Copy link
Collaborator

@smackesey smackesey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comments and needs tests. I think this can be merged independently of the downstack one where you add the key argument to PipesS3MessageReader, so you should reverse the order of the PRs. PipesS3LogReader can be included in this one or a separate PR if necessary.

@danielgafni danielgafni force-pushed the 09-30-_dagster-pipes_add_key_param_to_pipess3messagereader branch from 57055af to b99096f Compare October 11, 2024 15:01
@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch from e9b2966 to 9f11500 Compare October 11, 2024 15:01
@danielgafni danielgafni changed the base branch from 09-30-_dagster-pipes_add_key_param_to_pipess3messagereader to dagster-aws-PipesS3LogReader October 11, 2024 15:01
@danielgafni danielgafni force-pushed the dagster-aws-PipesS3LogReader branch from 47b81e1 to 8f170d7 Compare October 11, 2024 15:10
@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch from 9f11500 to 75ac46d Compare October 11, 2024 15:11
@danielgafni danielgafni force-pushed the dagster-aws-PipesS3LogReader branch from 8f170d7 to fafc04b Compare October 11, 2024 15:23
@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch from 75ac46d to 14d2bea Compare October 11, 2024 15:25
Copy link
Collaborator

@smackesey smackesey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To your queue (re-request review when this is ready)

@danielgafni
Copy link
Contributor Author

danielgafni commented Oct 11, 2024

downstack one where you add the key argument to PipesS3MessageReader

It's the other way: in order to merge this PR we only need PipesS3LogReader; PipesS3MessageReader's key argument didn't end up being useful for EMR; but it can be useful for EMR Serverless. I'll make a PR using it in EMR Serverless.

The current PR graph is correct.

@smackesey
Copy link
Collaborator

It's the other way: in order to merge this PR we only need PipesS3LogReader; PipesS3MessageReader's key argument didn't end up being useful for EMR; but it can be useful for EMR Serverless. I'll make a PR using it in EMR Serverless.

I think the comment you're responding to was made before you reordered the PRs. Yes, looks like the current stack order is correct.

@danielgafni
Copy link
Contributor Author

needs tests

@smackesey I think we can't test anything valuable right now (best we can do is to check Pipes params inkection), because testing setup for EMR is quite complicated and requires real AWS infra.

I have some manual tests in this PR. I just had a call with @gibsondan and we agreed on adding infra related tests to the internal repo.

I will start working on this later this week. This isn't just EMR's problem, other Pipes clients like Glue, EMR Serverless, ECS, and probably Databricks also require this.

For now I suggest we merge this PR without tests.

@danielgafni danielgafni force-pushed the dagster-aws-PipesS3LogReader branch from fafc04b to b9f8fc6 Compare October 14, 2024 11:52
@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch 2 times, most recently from 9ab25dd to 56fa3fe Compare October 14, 2024 11:55
Base automatically changed from dagster-aws-PipesS3LogReader to master October 14, 2024 12:16
@danielgafni danielgafni requested a review from smackesey October 14, 2024 14:58
@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch from 56fa3fe to be89f8e Compare October 14, 2024 15:15
Copy link
Collaborator

@smackesey smackesey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please invoke on_launched then feel free to merge

@smackesey smackesey dismissed schrockn’s stale review October 14, 2024 15:23

stale and now approved by @smackesey

@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch 2 times, most recently from cf98371 to e7bda0b Compare October 14, 2024 17:47
@danielgafni danielgafni force-pushed the 08-27-add_pipesemrclient branch from e7bda0b to 2e05446 Compare October 14, 2024 20:05
@danielgafni danielgafni merged commit 6fb82d9 into master Oct 14, 2024
1 check failed
@danielgafni danielgafni deleted the 08-27-add_pipesemrclient branch October 14, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: docs Related to documentation in general docathon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants