Sync Fails on 404 from Upstream Ubuntu Mirror (pulp_deb 3.8.0 / pulpcore 3.95.3)

We’re seeing sync failures when using pulp_deb to mirror an Ubuntu repository. The task fails with a 404 from the upstream mirror:

"description": "404, message='Not Found', 
url='http://mirrors.mit.edu/ubuntu/pool/universe/libs/libsoup3/libsoup-3.0-dev_3.4.4-5ubuntu0.6_i386.deb'"

This appears to be an upstream mirror inconsistency (the file is referenced in metadata but not present on the mirror). However, the entire sync task fails as a result.

Environment:

deb:       3.8.0
rpm:       3.34.0
core:      3.95.3
file:      3.95.3
ostree:    2.6.0
certguard: 3.95.3

Questions:

  1. Is it expected behavior for a single upstream 404 to cause the entire sync task to fail?
  2. Is there a recommended configuration (e.g. mirror policy, skip-missing behavior, retry tuning) to allow the sync to continue when a package is temporarily missing from a mirror?
  3. Is this typically a mirror issue (out-of-sync mirror) that resolves on retry, or something pulp_deb handles differently?

We’d like to avoid full sync failures caused by transient upstream mirror inconsistencies.

Thanks in advance for any guidance.

Full Stack Trace

File "/usr/local/lib/python3.11/site-packages/pulpcore/tasking/tasks.py", line 72, in _execute_task
    result = task_function()
             ^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/pulp_deb/app/tasks/synchronizing.py", line 223, in synchronize
    DebDeclarativeVersion(first_stage, repository, mirror=mirror).create()

File "/usr/local/lib/python3.11/site-packages/pulpcore/plugin/stages/declarative_version.py", line 163, in create
    loop.run_until_complete(pipeline)

File "/usr/lib64/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()

File "/usr/local/lib/python3.11/site-packages/pulpcore/plugin/stages/api.py", line 220, in create_pipeline
    await asyncio.gather(*futures)

File "/usr/local/lib/python3.11/site-packages/pulpcore/plugin/stages/api.py", line 41, in __call__
    await self.run()

File "/usr/local/lib/python3.11/site-packages/asgiref/sync.py", line 489, in thread_handler
    raise exc_info[1]

File "/usr/local/lib/python3.11/site-packages/pulpcore/plugin/stages/artifact_stages.py", line 188, in run
    pb.done += task.result()

File "/usr/local/lib/python3.11/site-packages/pulpcore/plugin/stages/artifact_stages.py", line 243, in _handle_content_unit
    await asyncio.gather(*downloaders_for_content)

File "/usr/local/lib/python3.11/site-packages/pulpcore/plugin/stages/models.py", line 119, in download
    raise e

File "/usr/local/lib/python3.11/site-packages/pulpcore/plugin/stages/models.py", line 111, in download
    download_result = await downloader.run(extra_data=self.extra_data)

File "/usr/local/lib/python3.11/site-packages/pulpcore/download/http.py", line 274, in run
    return await download_wrapper()

File "/usr/local/lib/python3.11/site-packages/backoff/_async.py", line 151, in retry
    ret = await target(*args, **kwargs)

File "/usr/local/lib/python3.11/site-packages/pulpcore/download/http.py", line 259, in download_wrapper
    return await self._run(extra_data=extra_data)

File "/usr/local/lib/python3.11/site-packages/pulpcore/download/http.py", line 295, in _run
    self.raise_for_status(response)

File "/usr/local/lib/python3.11/site-packages/pulpcore/download/http.py", line 187, in raise_for_status
    response.raise_for_status()

File "/usr/local/lib64/python3.11/site-packages/aiohttp/client_reqrep.py", line 636, in raise_for_status
    raise ClientResponseError(...)

Final error:

404, message='Not Found',
url='http://mirrors.mit.edu/ubuntu/pool/universe/libs/libsoup3/libsoup-3.0-dev_3.4.4-5ubuntu0.6_i386.deb'
1 Like

I can see the desire, specifically when you are not even interested in the failed package.

However it’s a delicate topic. Other people, specifically when mirroring an upstream they control, probably want to know all of their repository was synched flawlessly.

Ultimately this is up to the individual plugin, but I believe we would need to add a remote option to “allow failsafe syncs”. Either way does not serve everybody.

1 Like

Regarding your questions:

  1. Yes, it is currently by design that a single un-resolvable inconsistency in the metadata causes the entire sync to fail.
  2. By default there is a retry logic for downloads to ensure a one time download failure does not cause the entire sync to fail, but that will clearly not help if the mirror you are syncing from has a broken link on it. There is currently no way to tell pulp_deb to ignore or skip either a single (or simply all) failed downloads.

What your request boils down to is a feature request along the lines of: “allow skipping individual (or all broken) packages during sync”.

This feature request is not new, it periodically pops up both for pulp_deb as well as for pulp_rpm, but to my knowledge there has not been a serious stab at implementing it for either plugin.

See also:

2 Likes