Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: unify data access #226

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open

feat: unify data access #226

wants to merge 9 commits into from

Conversation

QuCMGisaia
Copy link
Contributor

No description provided.

@QuCMGisaia QuCMGisaia force-pushed the feat/unifiedDataAccess branch 3 times, most recently from bbacfb7 to 21fb963 Compare January 8, 2025 17:16
@QuCMGisaia QuCMGisaia force-pushed the feat/unifiedDataAccess branch from 21fb963 to a29cf9f Compare January 9, 2025 09:16
@QuCMGisaia QuCMGisaia force-pushed the feat/unifiedDataAccess branch from 66696bc to 209f9f3 Compare January 9, 2025 15:28
@QuCMGisaia QuCMGisaia force-pushed the feat/unifiedDataAccess branch 2 times, most recently from bfe7d36 to e43997f Compare January 10, 2025 11:40
@QuCMGisaia QuCMGisaia force-pushed the feat/unifiedDataAccess branch from e43997f to b4ea875 Compare January 10, 2025 13:25
Copy link
Member

@sylvaingaudan sylvaingaudan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great work. One aspect that is not covered for now is the error handling. The interfaces do not expose exceptions. This could be added once we get more experience.
I have some comments, mainly about class responsibilities. Sometimes it does things that are usually done by the client and that makes the AccessManager less generic.

Pulls a file from a storage to write it in the local storage.
If the input storage is local, then it is a copy. Otherwise it is a download.
"""
storage = AccessManager.resolve_storage(href)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make sure that dst is in the File storage prefixes (see todo add authorised paths).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to force the AccessManager to be initialized with exactly one FileStorage system also, don't we? I'll add the authorized paths in a new commit. There is already a text that is done to check that the file is in local storage that is performed by the AbstractStorage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we do not need to have exactly one FileStorage. We could have several, each of them could have a list of prefix. A copy to a dst is allowed if the dst has a common prefix with one of the prefix of the FileStorages that is writable (writable could be a property of the Storage).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just an idea.

return storage.get_storage_parameters()

@staticmethod
def pull(href: str, dst: str, is_dst_dir: bool):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is is_dst_dir for? Is it really up to the client to tell whether the dst is a dir or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to factorize the code for the creation of the filename and i need to know if the dst is a directory or not to to properly create the right filename. I will create a utilitary function to create the right filename, and will remove that argument in the method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

extensions/aproc/proc/access/manager.py Show resolved Hide resolved
if scheme != "" and scheme != "file":
raise ValueError("Destination must be on the local filesystem")

def prepare_for_local_process(self, href: str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the difference with pull, except the tempfile.gettempdir() that could be done by the client. This method taints the AbstractStorage interface with processing matters that are not necessary for what I see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would see instead of that, a utility function that would more transparently retrieve the data from a distant storage if needed for a processing? Or just call storage.pull and maybe have a check in the FileStorage that if src==dst there is nothing done? So users could just pull the file if needed to be transparent

file_name = os.path.basename(href)
# Get direct parent folder of href_file to zip
dir_name = os.path.dirname(href)
target_file_name = os.path.splitext(file_name)[0] + datetime.now().strftime("%d-%m-%Y-%H-%M-%S")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that it is up to the AccessManager to decide how the file should be named. I believe that this is up to the client. It might create code duplication but it makes responsabilities clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will add target_file_name as an input to the method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do you also agree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes more sense like that !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants