Help in enabling up Apt By Hash with Pulp via Docker Compose

Problem:

I am trying to properly setup by-hash for my apt mirrors of Ubuntu archive (and other repos that support it). I am setting up the Pulp “server” using the docker-compose setup from the Pulp OCI images repository. The feature overview page mentions that the option has to be enabled, which I did in settings/settings.py (base file here) by appending APT_BY_HASH = True and that a reverse proxy and cache must be setup for that.

At first, I took the reverse proxy and cache for granted since the docker-compose setup comes with the nginx configuration already in place and redis for caching. However, upon closer inspection of the logs, all by-hash requests made via apt failed, and in fact, the by-hash directory is not created in the repository either. So, I would appreciate more guidance in how to modify the setup, in particular the nginx.conf.template from the official repo’s docker-compose setup.

Here is an example of a request and answer:

Answer for: http://***/pulp/content/archive-focal/prod/dists/focal/main/binary-amd64/by-hash/SHA256/f25bb719a900d962a4df25cbb20f0a54a23d9f16c3fcdc4f4872ead131f5a604
HTTP/1.1 404 Not Found
Server: nginx/1.16.1
Date: Mon, 13 May 2024 11:29:50 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 14
Connection: keep-alive

Expected outcome:

Apt working normall with the by-hash requests to return an OK status (200), the by-hash directory to be created in the repository, and consumption of the corresponding mirrors with only rare occurrences of the Hash Sum mismatch error.

Pulpcore version:
3.13.0 (for the application managing the repositories)

Pulp plugins installed and their versions:

Management scripts:

  • pulp-cli==0.25.1
  • pulp-cli-deb==0.1.0
  • pulp-glue==0.25.1
  • pulp-glue-deb==0.1.0

Server:

  • deb: 3.2.0
  • rpm: 3.25.3
  • core: 3.52.0
  • file: 3.52.0
  • maven: 0.8.0
  • ostree: 2.3.0
  • python: 3.11.1
  • ansible: 0.21.3
  • certguard: 3.52.0
  • container: 2.19.3

Operating system - distribution and version:
Server:

  • Ubuntu Server 22.04 (jammy)
  • Pulp-minimal: 3.52

Clients (consumers):

  • Ubuntu 20.04 (focal) with apt 2.0.10
  • Ubuntu 22.04 (jammy) with apt 2.4.12

Other relevant data:

No error information, just 404 codes. The mirror can still be used, but is very prone to the Hash Sum mismatch issue.

If I read the changelog correctly, pulp_deb 3.2.0 starts to serve old publications (unless prematurely deleted) for up to 3 days after being replaced by a newer one. So I’d suspect that the (http-)cache mentioned (no, that’s not the pulp response cache backed by redis, I’m sorry) isn’t even needed anymore.

However the issue that the by-hash path does not get created prevails. Are you using auto publish? Do you have a repo-version-retention policy?

I am not using auto-publishing. The procedure I have is closer to the Maximum Flexibility workflow, using Verbatim publications, and the “on demand” policy.

As far as I could tell, there was no difference in the mirror repository definition (creation of Pulp remotes, repositories, etc.) with and without by-hash, right?

I left retain_repo_versions unset (i.e. it is using the default, null). I am currently not deleting any data or packages created or downloaded by manually. By the way, I tested this when setting up everything from scratch (no data at all).

I looked for the “by-hash” directory before and after packages are requested, and it is indeed missing.

Hmm. Verbatim publications are meant to mirror all the metadata (at least the one pulp_deb is interested in, and that may just well be the root of the issue.) without modification. So pulp_deb will not add by-hash files to that. (I guess)

I thought that Verbatim publications would actually help me in this case. In my mirror of Ubuntu focal, I can see these files

InRelease                                                                                           10-May-2024 10:12  264.9 kB
Release                                                                                             10-May-2024 10:12  263.3 kB
Release.gpg                                                                                         10-May-2024 10:13  1.6 kB
main/                                                                                               10-May-2024 10:13  5.8 MB
multiverse/                                                                                         10-May-2024 10:13  176.9 kB
restricted/                                                                                         10-May-2024 10:13  33.4 kB
universe/

The Release file includes in the last line

Acquire-By-Hash: yes

When I look into the upstream repository, I see this:

Contents-amd64.gz
Contents-i386.gz
InRelease
Release
Release.gpg
by-hash/
main/
multiverse/
restricted/
universe/

And the Release file also ends with

Acquire-By-Hash: yes

I am also wondering if there might be some interaction between verbatim metadata and the apt by hash feature that migth cause it to not work. That being said, I feel like if the upstream metadata advertises apt by hash, then verbatim publications should work with apt by hash in pulp_deb (but I could be overlooking some devil in the details).

That being said, I think there are a few more straightforward things to check before going down the verbatim rabbit hole:

Apt by hash should be used for new publications once pulp_deb is running at least version 3.2.0 (which you are), and once APT_BY_HASH = True is set (changing the setting may require restarting workers for it to take effect). Any publications that were created before both those things were true are not using APT by hash.

If you are sure all your publications are using APT by hash, and the clients are sufficiently new to know apt by hash, and you are still getting the error on your clients, then “something to do with verbatim” is the only idea I have left.

2 Likes

All the publications and workers were created after I set APT_BY_HASH = True.
Also, I know that the clients can use by-hash because they it properly with the upstream repositories.
So, I guess that leaves the Verbatim publications as the next potential culprit.

If I recall correctly, the main reason why I am using Verbatim publications is to be able to use the same signing keys as the remotes and to have everything as close as possible to them. I think that is the reason why I could get around not having a signing service. But I will try to check my notes on that to be sure that was the only reason.

1 Like

That is one reason to use verbatim, another is that creating verbatim publications is much much faster. For the normal APT publication pulp_deb needs to generate new metadata, for a verbatim publication pulp_deb does not need to do much of anything, since it will simply serve the upstream metadata that was synchronized. The flip side is that not generating new metadata means that pulp_deb has no control over the content of that metadata so there is a potential for a metadata to repo content mismatch.

2 Likes

I see, thank you for the insight. The performance is not really an issue in my use case. I just wanted to avoid setting up the signing server because it is more work. Still, I will make a small test telling apt to “trust” the mirrors even with the lack of keys and switch to the non-verbatim publications and get back to you.

Really I think pulp_deb should publish the by-hash files from upstream regardless of how APT_BY_HASH is configured since pulp_deb is mirroring Acquire-By-Hash from the synced/upstream repository. It’s not doing that today though so I’ve filed a feature request:

3 Likes

Your idea was correct. I switched to using the non-verbatim publications and by-hash for one of the mirrors and it started working without any changed to the docker-compose setup.
I just had to add trusted=yes to the sources lists, which is a temporary measure until I can get the signing service to work.

Thank you all for the help! Now I have better changes of getting rid of that pesky Hash Sum mismatch error.

Below, I just added some information that can be useful for those that find the same issue or when implementing @davidd’s feature request.


First, an excerpt of the apt logs.

Answer for: ***/pulp/content/archive-focal/dev/dists/focal/universe/binary-amd64/by-ha
sh/SHA512/18aa44bbba7bd9ffc451c4868b086361b7b59b599ed93183813825ac4c44b9470080c1a11e940d2d7dc5131565e82323eee11bfcd74
42d9ce20768efed0d0edb
HTTP/1.1 200 OK
Server: nginx/1.16.1
Date: Mon, 13 May 2024 14:44:11 GMT
Content-Type: application/octet-stream
Content-Length: 5956827
Connection: keep-alive
X-PULP-CACHE: HIT
Etag: "17cf124830917c4e-5ae4db"
Last-Modified: Mon, 13 May 2024 14:21:25 GMT
Accept-Ranges: bytes

Also, it seems that the location of the by-hash files is different from that in the upstream repository. I still get only this in my local mirror (URL /pulp/content/archive-focal/dev/dists/focal). It looks slightly difference from before.

../
Release
main/
multiverse/
restricted/
universe/

This is the view in the upstream repository (same as before):

Contents-amd64.gz
Contents-i386.gz
InRelease
Release
Release.gpg
by-hash/
main/
multiverse/
restricted/
universe/

The big change is in the inner directories and Release file.
/pulp/content/archive-focal/dev/dists/focal/universe/binary-amd64/ with “normal” publications:

Packages
Packages.gz
by-hash/

/pulp/content/archive-jammy/dev/dists/jammy/universe/binary-amd64/ with verbatim publications
compared with verbatim:

Packages
Packages.gz
Packages.xz
Release

You can find a Release file with the “normal” publications (/pulp/content/archive-focal/dev/dists/focal/Release) and another with the “verbatim” publications (/pulp/content/archive-jammy/dev/dists/jammy/Release) here. They are too long to inline in the post reply.

3 Likes