Unable to synchronize the apt-archive.postgresql.org repository

Hello everyone,

I managed to synchronize quite a few Debian apt repositories, but the PostgreSQL repository (https://apt-archive.postgresql.org/pub/repos/apt), which contains all the archived packages, is not working. I found online that they use an AWS S3 bucket to host their packages, which explains why there is no directory or file listing. You need to add index.html at the end of the URL to display the content.

With our old Nexus solution, this worked fine.
Pulp is installed in a Docker container on a machine running Ubuntu 22.04.

Does anyone have any ideas?
Thanks in advance!

1 Like

Have you tried adding deb remote, deb repository and synchronizing them in pulp? What error did you receive?

1 Like

I have indeed created a remote deb and a deb repository. The error occurs during synchronization and is as follows:
“description”: “403, message=‘Forbidden’, url=‘https://apt-archive.postgresql.org/pub/repos/apt/dists/jammy-pgdg-archive/Release’”
You actually need to add an ‘index.html’ at the end of the URL for the content to be displayed. But how can you make Pulp automatically add it at each step of the synchronization?

1 Like

I can download a release file at https://apt-archive.postgresql.org/pub/repos/apt/dists/jammy-pgdg-archive/InRelease.

This suggests that the problem is nothing to do with appending index.html to folder URLs for browsing.

What is meant to happen, is that pulp_deb attempts to download each of Release, InRelease, and Release.gpg from https://apt-archive.postgresql.org/pub/repos/apt/dists/jammy-pgdg-archive/index.html. In the case of this server only InRelease exists. That alone would not be a problem, since this is expected and pulp_deb should just ignore the artifacts it cannot download (so long as at least one of Release or InRelease exists). However, instead of returning 404 the files that do not exist return 403, and I suspect this causes the sync to fail (instead of simply continuing). I would need to look at the download code to be sure, but for now this is the theory I am going with.

@lgasperment Feel free to open an issue at https://github.com/pulp/pulp_deb/issues
Ideally add the commands you used to create your remmote and also the full output of the failed sync task. And be sure to link to this thread.

I cannot think of a workaround that does not require patching code. For example, one might drop the Release and Release.gpg from this list: pulp_deb/pulp_deb/app/models/content/verbatim_metadata.py at bc9612dba880e3efd2f5b2c93b9f025dec68ad15 ¡ pulp/pulp_deb ¡ GitHub

That should fix synchronization of this repo, but will break synchronization of any repos that do need Release and/or Release.gpg to be downloaded.

3 Likes

If you do not need to regularly synchronize this repository. You can try installing apt-mirror. It has a simple configuration and quick installation. Most likely apt-mirror will ignore the 403 error and download the repository. Then use pulp to download from apt-mirror

1 Like