Can't sync "non browsable" yum/rpm remotes

Hi Guys,

Problem:
I had this issue with both elastic(https://artifacts.elastic.co/packages/8.x/yum) and grafana (https://rpm.grafana.com) RPM repos. When i try sync it hangs around and eventually i see a sig 9 is sent to kill the task. So i never get to create a distribution.

From what i understand the issue is that the remote url is not “browsable”… i cant remember where i saw it but i heard that causes issues with pulp. Is that still the case?

If i add repodata/repomd.xml to the base urls (in a web browser/curl) i can get there just fine. Also if i add the remotes directly as repos on a RHEL9 VM then it works as expected… so i am confused why default yum can see this and not pulp…

Anyway, thanks in advance if you can shed some light on the problem or know a fix/work around.

Expected outcome:
completed sync

(ps. I know about issues/2402 in the logs below, it doesn’t seem to be the reason here, i get the same error with official RHEL repos and they working fine)

Pulpcore version:
“deb”: “3.4.0”,
“rpm”: “3.27.2”,
“core”: “3.66.0”,
“file”: “3.66.0”,
“maven”: “0.8.1”,
“ostree”: “2.4.4”,
“python”: “3.12.5”,
“ansible”: “0.22.2”,
“certguard”: “3.66.0”,
“container”: “2.22.0”

pod image versions

database - Image: docker.io/library/postgres:13
content/api/worker - Image: quay.io/pulp/pulp-minimal:stable
pulp-operator-controller-manage - Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.13.0
Image: quay.io/pulp/pulp-operator:v1.0.0-beta.5

Pulp plugins installed and their versions:
Pulp3 Command Line Interface, Version 0.30.0

Operating system - distribution and version:
MKE k8s cluster and/or k0s

pulp operator installed via helm (kustom overlay to use v1.0.0-beta.5, needed for disabled ipv6)

Other relevant data:

Commands run:

pulp rpm remote create \
    --name='grafana' \
    --url "https://rpm.grafana.com" \
    --policy on_demand \
    --tls-validation False

pulp rpm repository create --name grafana --remote grafana

pulp rpm repository sync --name grafana

pulp-cli output

Started background task /pulp/api/v3/tasks/0195664e-93f2-797a-b6a7-ea3a6da9cced/

…Error: Task /pulp/api/v3/tasks/0195664e-93f2-797a-b6a7-ea3a6da9cced/ failed: ‘Killed by signal 9.’

Logs from worker

pulp [18ccca97a7a4426c87b28cbf7c31f5e6]: pulpcore.tasking.tasks:INFO: Starting task 0195664e-93f2-797a-b6a7-ea3a6da9cced in domain: default

pulp [18ccca97a7a4426c87b28cbf7c31f5e6]: pulp_rpm.app.tasks.synchronizing:INFO: Synchronizing: repository=grafana remote=grafana

pulp [18ccca97a7a4426c87b28cbf7c31f5e6]: pulp_rpm.app.tasks.synchronizing:WARNING: The repository metadata being synced into Pulp is erroneous in a way that makes it ambiguous (duplicate PKGIDs). Yum, DNF and Pulp try to handle these problems, but unexpected things may happen.

Please read https://github.com/pulp/pulp_rpm/issues/2402 for more details.

pulp [18ccca97a7a4426c87b28cbf7c31f5e6]: pulp_rpm.app.tasks.synchronizing:WARNING: The repository metadata being synced into Pulp is erroneous in a way that makes it ambiguous (duplicate NEVRAs). Yum, DNF and Pulp try to handle these problems, but unexpected things may happen.

Please read https://github.com/pulp/pulp_rpm/issues/2402 for more details.

pulp [18ccca97a7a4426c87b28cbf7c31f5e6]: pulp_rpm.app.tasks.synchronizing:INFO: Excluding 15 packages (duplicates, outdated or skipping was requested e.g. ‘skip_types’)

pulp [18ccca97a7a4426c87b28cbf7c31f5e6]: pulpcore.plugin.stages.artifact_stages:WARNING: No declared artifact with relative path ‘grafana-10.0.10-1.armhfp.rpm’ for content ‘(UUID(‘019483e5-a524-7275-9a70-9c2d888439fb’), ‘grafana’, ‘0’, ‘10.0.10’, ‘1’, ‘aarch64’, ‘sha256’, ‘e853da0878bd8169bc7eef1d516a0fb9c9b7c661cbb6aa208700a5449997db01’)’ from remote ‘grafana’. Using last from available-paths : ‘grafana-10.0.10-1.aarch64.rpm’

pulp [18ccca97a7a4426c87b28cbf7c31f5e6]: pulpcore.plugin.stages.artifact_stages:WARNING: No declared artifact with relative path ‘grafana-10.1.6-1.armhfp.rpm’ for content ‘(UUID(‘019483e5-a524-7275-9a70-9c2d888439fb’), ‘grafana’, ‘0’, ‘10.1.6’, ‘1’, ‘aarch64’, ‘sha256’, ‘1b708ca6dee7c13fd563a14ccbee46f7fe235f7fc3f12a4f43ce02ed82f8efb3’)’ from remote ‘grafana’. Using last from available-paths : ‘grafana-10.1.6-1.aarch64.rpm’

pulp [4c93adfca8934232af80406f7386eadc]: pulpcore.tasking.worker:WARNING: Task process for 0195664e-93f2-797a-b6a7-ea3a6da9cced exited with non zero exitcode -9.

pulp [4c93adfca8934232af80406f7386eadc]: pulpcore.tasking.worker:INFO: Cleaning up task 0195664e-93f2-797a-b6a7-ea3a6da9cced in domain: default and marking as failed. Reason: Killed by signal 9.

Can you also provide the output of the failed task? In other words, the output of:

pulp show --href /pulp/api/v3/tasks/0195664e-93f2-797a-b6a7-ea3a6da9cced/

Sure here we go

pulp show --href /pulp/api/v3/tasks/0195664e-93f2-797a-b6a7-ea3a6da9cced/

{
“pulp_href”: “/pulp/api/v3/tasks/0195664e-93f2-797a-b6a7-ea3a6da9cced/”,
“prn”: “prn:core.task:0195664e-93f2-797a-b6a7-ea3a6da9cced”,
“pulp_created”: “2025-03-05T12:36:20.595991Z”,
“pulp_last_updated”: “2025-03-05T12:36:20.596026Z”,
“state”: “failed”,
“name”: “pulp_rpm.app.tasks.synchronizing.synchronize”,
“logging_cid”: “18ccca97a7a4426c87b28cbf7c31f5e6”,
“created_by”: “/pulp/api/v3/users/1/”,
“unblocked_at”: “2025-03-05T12:36:20.636863Z”,
“started_at”: “2025-03-05T12:36:20.856637Z”,
“finished_at”: “2025-03-05T12:43:10.818883Z”,
“error”: {
“reason”: “Killed by signal 9.”
},
“worker”: “/pulp/api/v3/workers/019483f5-e9ec-7ad1-966e-1343e7bdd2f1/”,
“parent_task”: null,
“child_tasks”: [],
“task_group”: null,
“progress_reports”: [
{
“message”: “Parsed Packages”,
“code”: “sync.parsing.packages”,
“state”: “running”,
“total”: 5038,
“done”: 1971,
“suffix”: null
},
{
“message”: “Downloading Metadata Files”,
“code”: “sync.downloading.metadata”,
“state”: “completed”,
“total”: null,
“done”: 4,
“suffix”: null
},
{
“message”: “Skipping Packages”,
“code”: “sync.skipped.packages”,
“state”: “completed”,
“total”: 15,
“done”: 15,
“suffix”: null
},
{
“message”: “Associating Content”,
“code”: “associating.content”,
“state”: “running”,
“total”: null,
“done”: 498,
“suffix”: null
},
{
“message”: “Downloading Artifacts”,
“code”: “sync.downloading.artifacts”,
“state”: “running”,
“total”: null,
“done”: 0,
“suffix”: null
}
],
“created_resources”: [],
“reserved_resources_record”: [
“prn:rpm.rpmrepository:0195664e-2ca6-7bee-ad36-4f3e63c0c838”,
“shared:prn:rpm.rpmremote:01956647-d2e2-7b98-a204-4472bbad9c12”,
“shared:prn:core.domain:019483e5-a524-7275-9a70-9c2d888439fb”
]
}

Pulp doesn’t know (and has never known) anything about “browsable” - it only knows how to look for repodata/repomd.xml starting from the URL in the remote. The following works, for example:

pulp rpm remote create --name grafana --url https://rpm.grafana.com/ --policy on_demand
pulp rpm repository create --name grafana --remote grafana
pulp rpm repository sync --name grafana
Started background task /pulp/api/v3/tasks/019566a3-dbf4-7813-b097-dcfaecce2a7c/
.....................................................................................Done.

SIGKILL (signal 9) is more likely to be “the OOMKiller came to visit”. The grafana library in particular has an enormous filelists.xml to process (<open-size>2202178308</open-size>) - the worker-process can grow to 5GB (!) when trying to ingest it. See https://issues.redhat.com/browse/PULP-280 for details. We haven’t addressed this issue yet.

artifactory repos can be missing filelists.xml entirely, which caused pulp_rpm to refuse to ingest them. That was fixed in the recently-released pulp_rpm/3.28 - see https://github.com/pulp/pulp_rpm/issues/3777

3 Likes

Just feedback… I got it to work by setting higher memory limits… thanks for your help

  worker:
    replicas: 2
    resource_requirements:
      requests:
        cpu: "1"
        memory: 2Gi
      limits:
        cpu: "4"
        memory: 7Gi
3 Likes

Outstanding - thanks for letting me know!