Invalid errata metadata (sum_type) in rhel7 repo?

Problem:
Since late september our rhel7 repos get errata updates which contain a pkglist with packages that have an empty sum_type attribute. This in turn makes pulp raise an exception while trying to publish the repo:

pulp [57a57e7b046d4ee78c52d32e6193df69]: pulpcore.tasking.tasks:INFO: Task 018b1e82-2f19-7bf4-944e-cb32e5833465 failed (Number expected!)
pulp [57a57e7b046d4ee78c52d32e6193df69]: pulpcore.tasking.tasks:INFO:   File "/usr/lib/python3.9/site-packages/pulpcore/tasking/tasks.py", line 66, in _execute_task
    result = func(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/pulp_rpm/app/tasks/publishing.py", line 386, in publish
    generate_repo_metadata(
  File "/usr/lib/python3.9/site-packages/pulp_rpm/app/tasks/publishing.py", line 590, in generate_repo_metadata
    upd_xml.add_chunk(cr.xml_dump_updaterecord(update_record.to_createrepo_c()))
  File "/usr/lib/python3.9/site-packages/pulp_rpm/app/models/advisory.py", line 187, in to_createrepo_c
    rec.append_collection(collection.to_createrepo_c())
  File "/usr/lib/python3.9/site-packages/pulp_rpm/app/models/advisory.py", line 322, in to_createrepo_c
    col.append(package.to_createrepo_c())
  File "/usr/lib/python3.9/site-packages/pulp_rpm/app/models/advisory.py", line 471, in to_createrepo_c
    pkg.sum_type = self.sum_type

So far I have seen this in the RHEL7 base repo and optional repo:
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/optional/os/
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/

One example of such an errata is RHBA-2014:0906, but there are a lot more and all of them are from 2014. By removing all these erratas from the repository the publish works again, but next sync it’s the same problem.

Is there a way to exclude old erratas from sync? Or any other workaround for this problem?

Expected outcome:
I guess that publish should work by better error handling when getting the sum_type attribute, or a way to sync without getting old erratas.

Pulpcore version:
python3-pulpcore-3.28.10-7.el9.noarch

Pulp plugins installed and their versions:
python3-pulp-rpm-3.22.3-1.el9.noarch

Operating system - distribution and version:
RHEL 9.2

Hey there @wiad ! I am looking into this, haven’t been able to reproduce so far.

The hash used for the package in the specific advisory you mention is MD5 (see below). Do you have MD5 turned on in your settings.ALLOWED_CONTENT_CHECKSUMS? (I am about to experiment w/ that, will report back on results)

I can’t explain how/why this “used to work” and now doesn’t; will keep digging and let you know what I find.

updateinfo.xml stanza for RHBA-2014:0906:

<update status="final" from="tkopecek@redhat.com" version="1" type="bugfix">
<id>RHBA-2014:0906</id>
<issued date="2014-07-21" />
<title>microcode_ctl enhancement update</title>
<release>0</release>
<rights>Copyright 2014 Red Hat Inc</rights>
<solution>Before applying this update, make sure all previously released errata relevant to your system have been applied.

This update is available via the Red Hat Network. Details on how to
use the Red Hat Network to apply this update are available at
https://access.redhat.com/site/articles/11258
</solution>
<summary>Updated microcode_ctl packages that add one enhancement are now available for Red Hat Enterprise Linux 7. </summary>
<pushcount>1</pushcount>
<description>The microcode_ctl packages provide microcode updates for Intel and AMD processors.

This update adds the following enhancement:

* The Intel CPU microcode file has been updated to version 20140624. This is the most recent version of the microcode available from Intel. (BZ#1120077)

Users of microcode_ctl are advised to upgrade to these updated packages, which
add this enhancement. Note that the system must be rebooted for this update to
take effect.
</description>
<updated date="2014-07-21" />
<references>
  <reference href="https://rhn.redhat.com/errata/RHBA-2014-0906.html" type="self" id="RHBA-2014:0906" title="RHBA-2014:0906" />
  <reference href="http://www.redhat.com/security/updates/classification/#none" type="other" id="classification" title="none" />
</references>
<pkglist>
  <collection short="rhel-7-server-rpms__7Server__x86_64_0_default">
    <name>rhel-7-server-rpms__7Server__x86_64_0_default</name>
    <package src="microcode_ctl-2.1-7.1.el7_0.2.src.rpm" name="microcode_ctl" epoch="2" version="2.1" release="7.1.el7_0.2" arch="x86_64">
      <filename>microcode_ctl-2.1-7.1.el7_0.2.x86_64.rpm</filename>
      <sum type="md5">245f7155e11deb5395319ca7fcff9afe</sum>
    </package>
  </collection>
</pkglist>

I have done a number of experiments today and continue to have no luck reproducing the problem, alas. I’ve played w/ the checksums available to my system, with immediate and on_demand, and with both of the repositories you call out, and I’ve yet to get a publish to fail. (Note: I can get sync to fail if I don’t allow sha1…)

Now, I just noticed that the current repomd.xml in the 7/Server repodata is from 2023-10-10 16:22:23 GMT and the matching updateinfo.xml.gz is from 16:19:29 GMT - could some problem have been corrected? If you sync “now”, does the problem remain?

In any event, some prob-gathering questions:

  • what’s your ALLOWED_CONTENT_CHECKSUMS set to?
  • what policy is your Remote using?
  • what sync-policy is used on your sync-requests?

Also, FWIW - here’s the script I’m using during my (various) experiments. You’d need to use your own certs, obviously:

#!/bin/bash                                                                                  
                                                                                         
#https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os                         
#https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/optional/os/               
R7_REMOTES=(\                                                                                
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os \                        
)                                                                                            
                                                                                         
#rhel_7_x86_64_server                                                                        
#rhel_7_x86_64_server_optional                                                               
R7_NAMES=(\                                                                                  
rhel_7_x86_64_server \                                                                       
)                                                                                            
                                                                                         
BASE=/home/ggainey/github/Pulp3/                                                             
                                                                                         
for r in ${!R7_REMOTES[@]}; do                                                               
  echo ">>>>> "[${R7_REMOTES[$r]}] INTO [${R7_NAMES[$r]}];                                   
  pulp rpm remote create --name "${R7_NAMES[$r]}" --url "${R7_REMOTES[$r]}" --policy on_demand \
      --client-cert=@${BASE}pulp_startup/CDN_cert/cdn_11012023.pem \                         
      --client-key=@${BASE}pulp_startup/CDN_cert/cdn_11012023.pem \                          
      --ca-cert=@${BASE}pulp_startup/CDN_cert/redhat-uep.pem                                 
  pulp rpm repository create --name "${R7_NAMES[$r]}" --remote "${R7_NAMES[$r]}"             
  pulp rpm repository sync --name "${R7_NAMES[$r]}"    
  pulp rpm publication create --repository "${R7_NAMES[$r]}"                                      
done

Hm, interesting that you cannot reproduce it. When I think about it, I think the problem appeared after upgrading from pulp_rpm 3.17 to 3.22, but I’m not sure. It took a while before it happened but the rhel7 repos are not updated all that often. But it could be something thats missing/corrupt in my migration.

But the pulp_created date on the erratas causing trouble is 2023-09-29 which led me to believe that the error was in the errata received from redhat.

Now, I just noticed that the current repomd.xml in the 7/Server repodata is from 2023-10-10 16:22:23 GMT and the matching updateinfo.xml.gz is from 16:19:29 GMT

A new sync does not update anything and the problem remains.

what’s your ALLOWED_CONTENT_CHECKSUMS set to?

It was set to ["sha1", "sha224", "sha256", "sha384", "sha512"], I added md5 (followed by pulpcore-manager handle-artifact-checksums) but publishing still fails with same error.

what policy is your Remote using?

immediate

what sync-policy is used on your sync-requests?

Not set, so additive. I have however tried them all, mirror_complete succeeded in creating a publication but we want to use retain_package_versions which does not work with this policy.

As I said, I added ‘md5’ to ALLOWED_CONTENT_CHECKSUMS. I then deleted all the faulty erratas and ran orphan cleanup and verified that the content no longer existed. At this point I could successfully publish my repo.
I then ran a new sync which once more pulled in new erratas with the empty sum_type attribute and publish no longer works.
Worth noting is that I already have existing erratas which are correct, i.e with ‘sum_type’ set to md5. But the sync pulls in a new errata where the only difference is the change in ‘sum_type’. And not all erratas are affected, it is only a subset of erratas from 2014.

I’m kind of stuck here, any advice on how to move forward?

md5 is enabled, pulpcore-manager handle-artifact-checksums --report says

Found 0 on-demand content units with forbidden checksums.
Found 0 downloaded content units with forbidden or missing checksums.

Removing an errata with empty sum_type attribute and then syncing results in a new errata being downloaded, still with an empty sum_type.

By checking all errata in my repo I am fairly confident that the issue only is with erratas containing packages with md5 checksums.

So how come my pulp sync suddenly overwrote the correct errata I already had (with ‘md5’ in ‘sum_type’) with a new errata with empty ‘sum_type’? And how can I fix it?
( I can copy the correct erratas from another repo copy that we keep, but next sync will overwrite it again).

Hey @wiad,

Sorry for the delay, but I have been experimenting more on your problem (along with perhaps too many other things, alas :slight_smile: )

I can’t recreate your issue “from scratch” (still). My current theory is that, at some point, the RHEL7 repo metadata was broken at CDN, or possibly was ingested incorrectly by Pulp, leading to the sum_type=None for some UpdateCollectionPackage records. When sync happens, and the sync finds advisories (UpdateRecord, in Pulp model parlance) with the same advisory-id but changed metadata, Pulp merges the new advisory and the existing one, leaving the new one with the existing ones’ broken sum_type. I’d like to see if we can “fix in place” the broken objects.

On your pulp-instance, you can use pulpcore-manager shell to enter a python shell that “knows about” your pulp instance. Once there, you can get the number of ‘broken’ UpdateCollectionPackage entries like so:

In [26]: from pulp_rpm.app.models.advisory import UpdateCollectionPackage
In [27]: UpdateCollectionPackage.objects.filter(sum_type=None).count()
Out[27]: 2

This shows 2 such records in my system (I deliberately Broke Things here)

If we assume that the missing checksum is meant to be MD5, that is represented by a sum_type of 1 (as you can see in ADVISORY_SUM_TYPE_TO_NAME ). We can then “fix” the broken sum_types with this command:

In [28]: UpdateCollectionPackage.objects.filter(sum_type=None).update(sum_type=1)
Out[28]: 2

Note that it tells us how many records were updated. Now, if we look for broken-sum_types again, we should see zero:

In [29]: UpdateCollectionPackage.objects.filter(sum_type=None).count()
Out[29]: 0
In [30]: 

At this point, publication should work, and resync should not break things again. He said hopefully.

Can you try this recipe out on your unhappy system, and let us know the results?

Pulp deduplicates entities; my working theory here is that there is a broken copy of the advisory(ies) “somewhere”, which Pulp finds when it is ingesting advisory-BLAH, and uses that as the base for the new sync. The recipe I suggested above, will fix all broken models, no matter where they are in your inventory.

If a sync after that still results in sum_type=None, then we’ll go back to the drawing board.

Ah, very useful tool this pulpcore-manager shell. Tried this but got a bit hesitant at the count():

>>> UpdateCollectionPackage.objects.filter(sum_type=None).count()
522354

Thats a lot. Looping it with

for x in UpdateCollectionPackage.objects.filter(sum_type=None):
    print(x.filename)

shows newer packages as well, for el8 and el9, which should not have md5 checksum. If I do the same loop but instead print x.sum I get empty results, so all these entities seem to be missing both attributes. Looking at the filenames of these entities it seems most of them are from EPEL.

Given the amount of entities I’m not comfortable with setting all these to md5. Running

>>> UpdateCollectionPackage.objects.filter(sum="").count()
522045

and then

>>> UpdateCollectionPackage.objects.filter(sum_type=None).count()
522354

Shows a discrepancy of about 300 entities that does have a sum set but no sum_type. I managed to figure out how to filter out these with:

UpdateCollectionPackage.objects.filter(sum_type=None).exclude(sum="")

and looking at the filenames of these entities it looks like a match to my rhel7 packages.

So to update only these objects I did a

UpdateCollectionPackage.objects.filter(sum_type=None).exclude(sum="").update(sum_type=1)

and after that it works! I can publish my repos and new syncs does not change anything.

Thank you so much for the time and effort you put in to helping me with this. I still don’t know what caused it but maybe some migration failure when updating pulp version left som metadata in a bad way.

2 Likes

Wow! OK, first - I am very pleased that you’re back up and running!

Two - you really took the ball and ran with it. Outstanding execution there, and a great writeup. Thanks for taking the time to come back with this, having your thought-process and steps documented here may help some future Us address a similar problem down the line.

This was a great way to start my day here - thanks again!

1 Like

Hm, unfortunately this re-appeared when rhel7 repos received updates. Doing a new count() shows 147 objects with empty checksum, so not as many as before but still - something in the pulp sync process seems to have rewritten the existing objects.

The sync replaced almost 3000 existing erratas, but that is not unusual (the one added was firefox).

"content_summary": {
    "added": {
        "rpm.advisory": {
            "count": 2857,
            "href": "/pulp/api/v3/content/rpm/advisories/?repository_version_added=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        },
        "rpm.package": {
            "count": 1,
            "href": "/pulp/api/v3/content/rpm/packages/?repository_version_added=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        }
    },
    "present": {
        "rpm.advisory": {
            "count": 5197,
            "href": "/pulp/api/v3/content/rpm/advisories/?repository_version=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        },
        "rpm.package": {
            "count": 12337,
            "href": "/pulp/api/v3/content/rpm/packages/?repository_version=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        },
        "rpm.packagecategory": {
            "count": 9,
            "href": "/pulp/api/v3/content/rpm/packagecategories/?repository_version=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        },
        "rpm.packageenvironment": {
            "count": 6,
            "href": "/pulp/api/v3/content/rpm/packageenvironments/?repository_version=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        },
        "rpm.packagegroup": {
            "count": 76,
            "href": "/pulp/api/v3/content/rpm/packagegroups/?repository_version=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        },
        "rpm.packagelangpacks": {
            "count": 1,
            "href": "/pulp/api/v3/content/rpm/packagelangpacks/?repository_version=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        },
        "rpm.repo_metadata_file": {
            "count": 1,
            "href": "/pulp/api/v3/content/rpm/repo_metadata_files/?repository_version=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        }
    },
    "removed": {
        "rpm.advisory": {
            "count": 2856,
            "href": "/pulp/api/v3/content/rpm/advisories/?repository_version_removed=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        },
        "rpm.package": {
            "count": 1,
            "href": "/pulp/api/v3/content/rpm/packages/?repository_version_removed=/pulp/api/v3/repositories/rpm/rpm/6e5965c8-9b5b-400a-bc76-fe4a47e03460/versions/217/"
        }
    }
},

Publishing works again after doing update(sum_type=1).

Um…argh. I am now back to being puzzled. EPEL has always been Exciting, but this is def new (and we have no other reports, which makes it even more confusing).

You def should not have to be re-fixing every time EPEL updates their errata, but at least we have a workflow to get you fixed up after the failure. Will continue digging to see if I come up with anything brilliant…

Just to clarify, this is not EPEL but Red Hats own repos for RHEL7. I realize I mis-wrote in an earlier post.

Got some new RHEL7 updates tonight and the problem re-appeared. This time I took a closer look at the ‘corrupt’ objects and they are all newly created:

{'_initial_state': {'arch': 'i686',
                    'epoch': '0',
                    'filename': 'kdesdk-kmtrace-devel-4.10.5-6.el7.i686.rpm',
                    'name': 'kdesdk-kmtrace-devel',
                    'pulp_created': datetime.datetime(2023, 11, 9, 1, 38, 47, 527846, tzinfo=datetime.timezone.utc),
                    'pulp_id': UUID('018bb1bb-40a1-7b56-bdac-df4a70a584e2'),
                    'pulp_last_updated': datetime.datetime(2023, 11, 9, 1, 38, 47, 527852, tzinfo=datetime.timezone.utc),
                    'reboot_suggested': False,
                    'release': '6.el7',
                    'relogin_suggested': False,
                    'restart_suggested': False,
                    'src': 'kdesdk-4.10.5-6.el7.src.rpm',
                    'sum': '324fd95b0dff69165bb775abff864238',
                    'sum_type': None,
                    'update_collection_id': UUID('018bb1bb-408f-7efe-9b41-725556e34eef'),
                    'version': '4.10.5'},
 '_state': <django.db.models.base.ModelState object at 0x7ff8318e3a00>,
 'arch': 'i686',
 'epoch': '0',
 'filename': 'kdesdk-kmtrace-devel-4.10.5-6.el7.i686.rpm',
 'name': 'kdesdk-kmtrace-devel',
 'pulp_created': datetime.datetime(2023, 11, 9, 1, 38, 47, 527846, tzinfo=datetime.timezone.utc),
 'pulp_id': UUID('018bb1bb-40a1-7b56-bdac-df4a70a584e2'),
 'pulp_last_updated': datetime.datetime(2023, 11, 9, 1, 38, 47, 527852, tzinfo=datetime.timezone.utc),
 'reboot_suggested': False,
 'release': '6.el7',
 'relogin_suggested': False,
 'restart_suggested': False,
 'src': 'kdesdk-4.10.5-6.el7.src.rpm',
 'sum': '324fd95b0dff69165bb775abff864238',
 'sum_type': None,
 'update_collection_id': UUID('018bb1bb-408f-7efe-9b41-725556e34eef'),
 'version': '4.10.5'}

Does this output give any hint to why sum_type is not set?

I can dig thru the primary.xml.gz of RHEL7 and see what it tells me, at least. We’ll see if it gives me an “ahha!” moment :slight_smile:

Well, I have no idea where the sum_type=None come from, because I can’t make createrepo-c spit one out, and I can’t reproduce the issue.

I can however fix the error pretty easily, such that it will just omit the checksum if is an invalid value.

edit: well, I have one idea, I’m double checking if createrepo_c is built with legacy hashes support enabled