Inspired by the thread about running Pulp in Production, I figured I’d share a problem that we’ve run into and that’s with cleaning up orphans.
Say you’re running Pulp in production for quite some time and you have a repo that’s collected hundreds of versions. Eventually you want to clean up these versions and their content. Pulp provides an automated way to do this so you don’t have to do it for each individual repo and that’s via “retain_repo_versions”.
The problem is that the retain_repo_versions feature is unsafe. Let me give an example: let’s say you have repo whose id is set on a distribution and its current version 98 is currently published. You have a bunch of versions (and their content) to cleanup so you set retain_repo_versions to 3. However, if you add 4 packages to your repo one-by-one, then the current version is 102 (unpublished) and version 98 (published) gets deleted. In turn your repo becomes un-distributed.
We could set retain_repo_versions to a very high number (e.g. 100) but users often do crazy things (like add 101 packages one at a time with a script or something). So for the time being, we’ve disabled retain_repo_versions and content is beginning to accumulate in our system.