Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition API design for Phase II #11

Open
lliming opened this issue Sep 6, 2024 · 1 comment
Open

Transition API design for Phase II #11

lliming opened this issue Sep 6, 2024 · 1 comment

Comments

@lliming
Copy link

lliming commented Sep 6, 2024

In Phase II, the consolidated index will NOT include file-level entries in the Globus Search index. Clients will not need to inspect additional index entries to learn about individual files in a dataset. Instead, each dataset entry will contain a new file manifest field: a file list including pathnames and checksums for each file in the dataset. Clients can use that manifest to request individual files via HTTP/S.

How will the transition API manage this change? Will the transition API mimic file-level entries when they're removed from the Globus Search indices? Or...?

@bstrdsmkr
Copy link
Collaborator

@lliming what's the driver for not including file-level entries? I think this has significant downsides that will be hard to overcome -- for example if a dataset has 4 million files then doesn't that yield a length 4m array in the single dataset document? That's a pretty big transfer if you just want some metadata about the dataset. Or am I misunderstanding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants