Problem replicating rpm repositories, checksum error on metadata files

While attempting a rpm repository sync an error is received stating a cheksum failure has occurrred processing some of the remote metadata files. See log below, the uncompressed file has the correct checksum!

Expected outcome:
To understand why this happens and how to fix it

Pulpcore version:

Pulp plugins installed and their versions:

“component”: “core”,
“version”: “3.22.0”,
“package”: “pulpcore”
“component”: “rpm”,
“version”: “3.18.9”,
“package”: “pulp-rpm”
“component”: “container”,
“version”: “2.14.3”,
“package”: “pulp-container”
“component”: “file”,
“version”: “1.11.2”,
“package”: “pulp-file”
“component”: “ansible”,
“version”: “0.16.0”,
“package”: “pulp-ansible”

Operating system - distribution and version:
RHEL 7.0

Other relevant data:

Actions log:

start of replication

(venv) ~ 24$ pulp --config $CONFIG rpm repository sync --name rpm_x86_64_8_tndist

Started background task /pulp/api/v3/tasks/aac20fae-ff9f-4786-bd79-7fc3b40a9ad0/ …
Error: Task /pulp/api/v3/tasks/aac20fae-ff9f-4786-bd79-7fc3b40a9ad0/ failed: ‘A file located at the url http://xxxxx/repodata/7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269-other.xml.gz failed validation due to checksum.
Expected ‘7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269’,
Actual ‘8d7539bfee2558185fcfe486d0a2ef80ab8e27377f52eec5c87d3e1dcc3f3a99’’


1 Download the file

(venv) $ wget http:/xxxx/repodata/7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269-other.xml.gz

–2023-01-27 00:11:55-- …

Saving to: ‘7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269-other.xml.gz’100%[==========================================================>] 4,360 --.-K/s in 0s 2023-01-27 00:11:55 (30.5 MB/s) - ‘7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269-other.xml.gz’ saved [4360/4360]

Calcc checksum

(venv) $ sha256sum 7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269-other.xml.gz
7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269 7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269-other.xml.gz


(venv) $ gunzip 7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269-other.xml.gz

Calc checksum of the uncompressed file

(venv) $ sha256sum 7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269-other.xml

8d7539bfee2558185fcfe486d0a2ef80ab8e27377f52eec5c87d3e1dcc3f3a99 7155cbc2899de75445b33246d3da5b15c4171af025db628a135d5e05cb942269-other.xml

Is pulp rpm wrongfully looking at the sha256sum of the uncompressed file when it should look at the one for the compressed file?

I have a vague memory that at one point we hit this as a result of a misconfigured remote-webserver. The server was “helpfully” uncompressing the .gz in-transit, so that by the time Pulp received it it was uncompressed, but Pulp didn’t know that.

@dralley - does this ring a bell to you?

Yes, it does ring a bell

As remote web server we are running twisted ( twistd -n web --path … )
I just verified that the content is not being uncompressed on flight

However a colleague noticed that the server Content type header is incorrectly set to text/xml , I’m now looking into whether correcting this header issue fixes the problem …


Great - let us know how that goes!

This really sounds familiar to me. We had the very same issue with I think I remember that the aiohttp library (used by Pulp to download artifacts) was trying to be helpful in decompression. And we found no way to tell it otherwise.
In the end, the server confused Content-Type with Content-Encoding.

1 Like