Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): split out temp wheel builds #12157

Merged
merged 15 commits into from
Dec 19, 2024
Merged

feat(python): split out temp wheel builds #12157

merged 15 commits into from
Dec 19, 2024

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented Dec 18, 2024

This should enable wheels to be available within ~2 minutes of the commit, which is significantly faster than the ~18 minutes we currently get with Vercel builds.

The new site also comes with a nicer index page, which gives you a brief tutorial on how to use it.
image

The new dedicated Python wheels site is deployed with Cloudflare pages, which means it will not be generated for PRs from forks. We can look into using pull_request_target for those in the future.

I intend to remove Python wheels from the docs site in the future, once this change has baked for a while.

This PR also updates our CI pipelines across our Python builds - upgrading versions, fixing various bugs with flags, etc

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata devops PR or Issue related to DataHub backend & deployment labels Dec 18, 2024
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Dec 18, 2024
@sgomezvillamor
Copy link
Contributor

which means it will not be generated for PRs from forks

isn't that the usual way of working?
should we consider adapting to work directly on the original GitHub project instead?

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Dec 18, 2024
@hsheth2
Copy link
Collaborator Author

hsheth2 commented Dec 18, 2024

@sgomezvillamor I primarily create branches on the datahub-project/datahub repo, and other folks with committer access can do the same. This has other benefits e.g. cypress cloud will show logs of exactly what tests failed and will show screenshots.

@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Dec 18, 2024
@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Dec 19, 2024
Copy link
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Just minor suggestions

':metadata-ingestion-modules:prefect-plugin:buildWheel',
':metadata-ingestion-modules:gx-plugin:buildWheel',
]) {
commandLine python_executable, "copy_wheels.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copy_wheels.py script could be replaced with an inline rsync command. This change would bring the list of directories to be synced closer to the gradle dependencies of the task, reducing the risk of misalignment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup - will note that down for a follow up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit hash, commit message, branch name... all this info is available in the github actions context.
We can pass these details as arguments to the gradle build, which would streamline this script

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we pull some of these from the github env variable context. however, I wanted it to work similarly across local dev and CI, so we have both variants

@hsheth2 hsheth2 merged commit 08605a9 into master Dec 19, 2024
104 of 106 checks passed
@hsheth2 hsheth2 deleted the fast-wheel-builds branch December 19, 2024 16:02
@hsheth2 hsheth2 mentioned this pull request Dec 20, 2024
5 tasks
yoonhyejin pushed a commit to yoonhyejin/datahub-oss-forked that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants