-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove orphan tasks failing on Katello 4.9 #865
Comments
We had a very similar issue (but with a different model) in pulp_deb: #690 Just thought I would drop a link in case anyone wants to have a look. |
Another possible scenario is that orphaned content was added to a repository while the orphan cleanup task was running and thus was no longer orphaned, but still on the list of content to try to delete. |
if this is the case, how could I go about searching for the repo in question so that I can delete and re-add? |
@gerrod3 should we then just try catching the ProtectedError, log and proceed to the next "orphaned content"? |
Not sure it is possible through the API. If you can access the shell you could look up the RepositoryContent object referenced in the traceback. That object will have a reference to the repository the content is a part of.
from pulpcore.app.models import RepositoryContent
rc = RepositoryContent.objects.get(pk="ff32c510-ad1c-4288-918a-442fba668125")
repo = rc.repository
print(repo.name)
Probably, since the task can be ran concurrently with others we should expect that sometimes orphans will no longer be orphans. |
thanks @gerrod3 |
Hmm, that's interesting. @quba42 Have you seen this before? I have once, but not in the Ubuntu context, and not in a way that is easily repeatable. |
@dralley In the pulp_deb case we had a foreign key relation between two types of metadata content, and under certain (reproducible) conditions, it was possible for one of the metadata to be orphaned, while the other was not orphaned and still referencing the orphaned one. The solution was to get rid of that foreign key relation entirely (we weren't actually using it for anything). This was a case of pulp_deb had a foreign key relation it should not have had. This case is different, since it simply makes no sense for a |
Looks like more users are running into this: https://community.theforeman.org/t/remove-orphan-tasks-failing-on-katello-4-9/34456/4 Could this be some effect where something is an orphan when the remove orphans task is started, but is then re-added to a repository by the time the orphan cleanup actually tries to delete it? If so, then it should be possible to simply re-run the orphan task without running into the same error again. Can anyone with the problem confirm or deny? |
…rallel. closes #4209 Orphan clean up can fail when other tasks like sync or content upload might be rinning in paralell.
…rallel. closes #4209 Orphan clean up can fail when other tasks like sync or content upload might be rinning in paralell.
…rallel. closes #4209 Orphan clean up can fail when other tasks like sync or content upload might be rinning in paralell.
…rallel. closes #4209 Orphan clean up can fail when other tasks like sync or content upload might be rinning in paralell.
The root cause for this bug report ( even if rpm plugin version is provided, rpm plugin has nothing to do with it...yes.. debian also has Package model) and the one described in the foreman blogpost is reincarnation of this debian issue #690, just this time it is here https://github.com/pulp/pulp_deb/blob/main/pulp_deb/app/models/content/structure_content.py#L91, @quba42 please more eyes here I might be wrong since I did not look in depth into debian content structure. I don't think the referenced bugzila is related either. Note that the user has pulp-deb 2.20
Notice that in both bug report and foreman blogpost the error is:
So re-running orphan clean-up might not help in case this issue is of the same nature as with ReleaseFile. I did notice that the fix for the ReleaseFile issue, that contained a migration got backported into 2.20.2 390e7ad. We should not backport migrations. |
The proposed fix should unblock the orphan clean up task, just these Content instances will always be in the limbo state and won't be removed (i.e. with every orphan clean up task there will be logged non removable content). |
@ipanova You are right. Given that @lravelo confirmed the affected content came from an Ubuntu repo, and a It follows this is not a pulpcore, but a pulp_deb issue. Unlike with #690 the solution won't be to get rid of the foreign key relation in question. In fact, a pulp_deb repo version should never contain a 1.) Preventitive: Identify any pulp_deb actions that can result in this inconsistent state and modify them so they cannot produce such inconsistent states. (I don't know what caused this state in this current case, but I think I could force this result myself on a test system, so it is certainly possible). It is worth noting that pulp_deb has had this relation pretty much unchanged since the beginning of Pulp 3, so it would be interesting to understand how we were able to avoid this issue up to now. (What workflow suddenly started producing these cases?) |
Is it possible to move this issue to pulp_deb, or do I just open a new one and link to this one? |
In pulp-container we have quite a similar concept, where it does not make sense to keep Blob without it's corresponding Manifest in the repo version, hence we have implemented a 'recursive removal/add logic' maybe debian could use similar concepts? https://docs.pulpproject.org/pulp_container/workflows/manage-content.html#add-content-to-a-repository
That's a good question, I don't think we have changed orphan removal logic except for disallowing running in parallel 2+ orphan cleanup tasks. I agree that starting with a reproducer is a step number 1. |
I have a PR attached to it, but I can create a new issue since the fix has a bit different scope from the original root cause |
@ipanova Do you think it would make sense to mitigate this (and similar issues) by adding something like a |
That's basically what my PR is doing, if you look at it. |
I have a tiny bit of doubt about this because there was a BZ filed, against Satellite, presumably without the Debian plugin. But we have no actionable information from that report and it's one single report in a long long time. Whatever the problem is does seem to occur mostly with the Debian plugin? So I have no issue with moving it to the Debian plugin. We can do that About the PR, it seems the discussion is along the lines of:
I think we should see what we can do in terms of repair before we necessarily make a decision either way in terms of the PR. |
@dralley idk about the repair part in both bullet points of yours, the script should be plugin self aware otherwise how it should know what to do with e.g. debian Package, or PackageReleaseComponent? Blindly removing FK does not seem like a good idea. |
Let me open a separate issue for my PR and move the discussion there, I will copy over revelant comments |
Looks like we already have an issue for 1: #785 Which means this issue can be used for 2. |
Version
katello 4.9
Describe the bug
remove orphan tasks in foreman are failing with the following type of message:
To Reproduce
Steps to reproduce the behavior: running any remove orphan task will create this error
Expected behavior
I would expect the task to remove orphans successfully
Additional context
https://bugzilla.redhat.com/show_bug.cgi?id=2164551
journalctl -u pulpcore-worker@*
PULP_SETTINGS=/etc/pulp/settings.py pulpcore-manager showmigrations rpm
Task Output:
The text was updated successfully, but these errors were encountered: