Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task/update fork method preserve owner parameter #473

Conversation

shaikh-ma
Copy link
Contributor

@shaikh-ma shaikh-ma commented Nov 26, 2024

As informed by Crunch team, the special properties of the "Personal" folder are about to be replaced with a regular project folder that behaves like any other Crunch folder, at some point approaching the end of CY 2024.

Recently added functionality to create a fork in a target location will still allow to use preserve_owner and hence might lead to creating the fork in either the "Personal" folder or keeping the source dataset's location.

This MR proactively attempts to push towards the best-practice approach here by updating the fork method.
The behaviour of the preserve_owner parameter has been updated as described below:

  • If preserve_owner=False and a project is not provided, we want to raise and indicate that an explicit project location is expected.
  • If preserve_owner=False and a project is provided, this is consistent with what we want to acheive and the code can be executed as is.
  • If preserve_owner=True and a project is provided, raise and indicate that this combination is ambiguous, only one of the two can be used AND that preserve_owner will be deprecated soon.
  • Regardless of the value given to preserve_owner, log a DeprecationWarning that the preserve_owner parameter will be removed soon in favour of providing a project or not (in which case the behavior will be as it is currently if preserve_owner=True and project=None).
Project passed Preserve_owner = True
Yes image
No image
Project passed Preserve_owner = False
Yes image
No image

@alexbuchhammer
Copy link

Hey @jjdelc - what's your opinion on the suggested changes? We do not yet want to get rid of the preserve_owner parameter to protect users from breaking scripts. Let us know!

Cheers
Alex

@jjdelc
Copy link
Contributor

jjdelc commented Nov 28, 2024

This looks in the right direction, but I worry that the change may come sooner than expected. I will check with @rchacon1 on what the status of this is in the backend.

from scrunch.mutable_dataset import MutableDataset

description = description or self.resource.body.description

LOG.warning(
"The preserve_owner parameter will be removed soon"
" in favour of providing a project or not "
Copy link
Contributor

@rchacon1 rchacon1 Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A project will always be required in the future state as far the API is concerned.

Copy link

@jamesrkg jamesrkg Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is, as an abstraction over the API, the user can either give a project or not, and if they don't, then the fork will be created in the same location as the dataset being forked. What happens below the signature of the method isn't something we want users to think about. In this way, it will be possible for a user to say ds.fork(..., project=None), even though that's not what is implemented internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to keep the preserve_owner abstraction you can future proof it by sending resource.body.owner as the project in the payload.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yep that's what happening here:

scrunch/scrunch/datasets.py

Lines 2420 to 2427 in 930905a

if preserve_owner:
if project:
raise ValueError(
"Cannot pass 'project' or 'owner' when preserve_owner=True."
)
else:
# Create fork in source dataset path.
body["project"] = self.resource.body.owner

scrunch/datasets.py Outdated Show resolved Hide resolved
)
else:
# Create fork in source dataset path.
body["owner"] = self.resource.body.owner
Copy link
Contributor

@rchacon1 rchacon1 Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can replace body["owner"] with body["project"] to future proof this. same goes for any other place in this function where owner is added to payload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@rchacon1 rchacon1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rchacon1 rchacon1 merged commit 8bb8a79 into Crunch-io:master Dec 18, 2024
5 checks passed
@shaikh-ma
Copy link
Contributor Author

Hi @rchacon1 @jjdelc,

Please could you push these changes in a tag release. 🙇

@rchacon1
Copy link
Contributor

rchacon1 commented Jan 9, 2025

version 0.18.6 published to pypi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants