-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content-app allows the use of other users Remote credentials when retreiving on-demand content. #3212
Comments
This is a security concern in case User B happens to lose his credentials ( become no longer valid ) for the remote source. |
I was thinking we could add here some rbac filtering https://github.com/pulp/pulpcore/blob/main/pulpcore/content/handler.py#L690 and filter only those remotes user can see. The problem is that User B does not necessarily need to be same user who has synced repo. It can also be User C who has been just exposed content for download ( end user) and has nothing to do with the User B credentials which were used during repo sync :/ |
I think this allows one user to steal artifacts from another if you happen to know the digest. |
@mdellweg you can steal content this way if it also happens to be already downloaded with query-existing-content stage |
Good observation. Do we need to add queryset scoping to that stage? |
Or maybe we just say: "Some level of trust is needed for people acting in the same domain." (Once we introduce that feature.) |
One of the ideas that came into my mind was to create a new relation/object between remote artifacts and content artifacts/content/repository content (one of those, will decide later). Right now, a content unit ( pulpcore/pulpcore/app/models/repository.py Line 541 in 2801c04
pulpcore/pulpcore/app/models/content.py Line 595 in 241ca85
pulpcore/pulpcore/app/models/content.py Line 684 in 241ca85
If we could isolate and create a set of related remote artifacts, content artifacts, and repository content units, so that after every sync operation there will be a unique set of them, only that particular remote that was "associated" with the sync task, the exact same remote artifacts, and repository contents will be used to additionally download on-demand data. Benefits: Users will not be able to use remote configurations of other users (because we will keep the track of the tuples (remote,remoteartifact,contentremoteartifact) per repository version and other irrelevant remotes will not be considered at all). The content will be always downloaded from the same source as declared (right now, we iterate seemingly randomly through remote artifacts and try to record the first downloaded result). Questions:
|
Actually, why do we need to deduplicate content units? What really needs to be deduplicated are artifacts because those can be huge. If we allow duplicates of content units that can be attached to repository versions, we should be fine, should not we? |
I made this by first installing some software in oci_env:
Then make the visualizations:
Then I moved them from the oci_env to my host for posting with:
Here's the all models for pulpcore which is where I manually picked the models of interest (have a look in case I missed any). |
Sadly, Idea 2 may not be sufficient for pull-through cache workflows. There might be a case where a user syncs content with the on-demand policy and another user addds a remote to the distribution to enable pull-through cache. Then, for the pull-through cache workflows, there are no repository content objects involved at all. So, we cannot benefit from the relation and the second user can still end up using another user's credentials. Is that correct, @gerrod3? Or, there is no notion of remote artifacts? |
Pull-through caching uses the remote on the distribution to find the remote artifact to use to gather the content. There is this assumption in the sync pipeline that each remote artifact has one url per content artifact so pull-through cache uses the url of the request to the content-app and associated remote to find the one remote artifact with the "matching" url. In this case we are not looping through remote artifacts, just the one for the remote if it exists, else we create a new one at request time. |
I have a problem with Idea 1. Losing Content's uniqueness is going to have a ripple effect into various places that currently can assume "If I ask for a piece of Content by its 'natural keys', I am only ever going to get one answer". Idea-1 changes that assumption, in ways that will require code-change to adapt to. I'm a much bigger fan of Idea-2. This one places the burden on "which remoteartifact to use" on the repository being sync'd (which does know/care about 'what remote is being used to sync'), instead of requiring every user of Content objects to have to be aware of them being potentially non-unique. |
OK, I am going to attempt to summarize the outcome of a meeting to discuss these proposals. Attendees included @ggainey @bmbouter @dkliban @gerrod3 @dralley @mdellweg . To sum up:
Some questions that came up:
I am certain that I have missed or misstated some of the meeting output, calling on the other attendees to chime in and straighten me out here :) |
My understanding is,
I do not think you can cascade delete objects through a many-to-many relationship. And we should not do that, because the same remote artifact may be used in a different repository. (If there were a "remove if many2many relation is empty" we could use that.)
See comments above.
Easy. Orphaned content will not stop remotes from being deleted, while on-demand content in a repository will (by protecting the
Open Questions for me: How does this play with ACS? |
Is there any reason why we can't update the selection logic of the
On the content handler context we can know the right The main content-handler dispatch is done in pulpcore/pulpcore/content/handler.py Lines 596 to 598 in 93f241f
So if we passed that # on _stream_content_artifact method
remote_artifacts = content_artifact.remoteartifact_set.filter(
remote=associated_remote).order_by_acs() Does that make sense? |
I really like the fresh minds approach here. The one thing that comes to my mind is that content (being on-demand) can be moved from one repository to another. And so you cannot be sure that the remote the repository is pointing to really is the same the content was created with. There could even be two content units requiring different remotes being part of the same repository version. |
This approach could fix some cases but not all of them, right? @mdellweg having a repo with remote and a copied into it on_demand content (different remote) sounds really messy, even if we seem to be allowing this as of today |
I kept wondering if using the same objects for persisted and on-demand content is a good idea. |
What you are describing is more closely related to #5012. On this issue, the RemoteB (associated with RepoB) replaces the Sum1 pkg by Sum2 as you said, but that is actually synced with on-demand for RepoB. Then, if we request that nevra pkg from the DistA url (associated with RepoA, and RemoteA was not changed), we sometimes use the RemoteB and get the wrong package. This happens because: (a) They shared the same ContentArtifact when remotes contained the same same nevra Sum1 pkg. This CA has two RA. |
I am afraid we cannot take issues with remote artifacts and solve them separately. We should rather collect all of them and come up with a design change that will cover all of them. You proposal in isolation fixes this particular issue, so correct RA will be picked based on the remote associated to the repo. But what about those repos that do not have remote permanently associated to it? One could just create a repo, a remote and then in the sync call use that remote. |
That's fair, we really need a broader approach here. |
We may encourage users to use domains in case of worrying about using other user's credentials during the on-demand download. |
If User A and User B both sync the same on-demand content from their separate remotes the content is only created once, but two remote artifacts are created. When User B goes to download the content the credentials used for the download could be User's A since the content-app loops through the remote-artifacts for the content and chooses the first available one: https://github.com/pulp/pulpcore/blob/main/pulpcore/content/handler.py#L689-L696.
The text was updated successfully, but these errors were encountered: