-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Switch file reading to use concrete endpoints #48
Conversation
@nsmith- is there anything else this one needs? The use case is becoming ever more pressing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is good, however I think this should be a configurable behavior. We should add a boolean flag locate_all_sources
to fs.storage_options to enable it, with a docstring that explains this locates all sources of the file rather than letting the redirector pick the first one to respond. We can set locate_all_sources=True
as a default, though I would encourage some performance studies: we're essentially trading opening speed for robustness.
One puzzle I have is: the redirector returns the first host to respond, whereas I am not sure what the order of locate()
's response list is. Will we be getting suboptimal sources more often with this method? Without speculatively requesting data from the other sources during the process, we may be hurting ourselves.
It might be useful, since this is a bit the "manual transmission" version of the complete fix, to allow a user to specify a list of sites/servers that they will accept as valid endpoint responses. This way we can manage sites that respond immediately which are known to have subpar QoS. |
@rpsimeon34 could you please follow up on the review comments. |
Sorry for the delay. I'll try to implement the |
for more information, see https://pre-commit.ci
I think this should be ready for another round of review when maintainers have time. I'm not sure how to faithfully replicate an XRootD redirector, so I haven't written a proper unit test for the locate_all_sources or valid_sources behaviors. But, some manual testing with files I'm familiar with suggests they work as intended. P.S. Sorry @lgray for removing and reinstating your logging.debug commit - I thought pre-commit had somehow switched it for me! |
No worries, I figured you wanted those print statements in but obviously it's there just for information if you want to see it, hence debug logging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay in review.
You'll also need to resolve the conflicts since there were a few upstream PRs merged. |
I believe requests have been addressed. If I misunderstood/mis-implemented something, please let me know and I'm happy to make more changes. Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay (again)
This PR tries only to find concrete endpoints only after failing to open with the initial URL
Addressing Issue #36
This is not really a multi-source reading commit, but it uses the Pepper-inspired algorithm to find concrete endpoints and read from one of those, rather than through the redirector.
The XRootDFile class gets a list of hosts now, so that could become the basis for a more multi-source approach.