Sync Task Stuck in “waiting” State After Previous Failed Sync (pulpcore 3.91.1)

We’re running pulpcore 3.91.1 with pulp_deb. After a previous sync failed, I attempted to re-run the sync for the same repository. The new sync task remains in the “waiting” state indefinitely and never starts.

Below are the details and outputs.

Task Details

pulp task show --href /pulp/api/v3/tasks/019a0265-3074-7b7f-b506-e56ed64214b2/
{
  "pulp_href": "/pulp/api/v3/tasks/019a0265-3074-7b7f-b506-e56ed64214b2/",
  "prn": "prn:core.task:019a0265-3074-7b7f-b506-e56ed64214b2",
  "pulp_created": "2025-10-20T16:12:57.355435Z",
  "pulp_last_updated": "2025-10-20T16:12:57.332529Z",
  "state": "waiting",
  "name": "pulp_deb.app.tasks.synchronizing.synchronize",
  "logging_cid": "00fd78f43c194620a5cbc7d3e5a73f2c",
  "created_by": "/pulp/api/v3/users/1/",
  "unblocked_at": null,
  "started_at": null,
  "finished_at": null,
  "error": null,
  "worker": null,
  "parent_task": null,
  "child_tasks": [],
  "task_group": null,
  "progress_reports": [],
  "created_resources": [],
  "reserved_resources_record": [
    "prn:deb.aptrepository:01993aa7-8c75-7d4b-a862-c11f07825bad",
    "shared:prn:deb.aptremote:01994e63-f295-7d00-9ab4-e0ecf9febddd",
    "shared:prn:core.domain:b25fd239-cd6b-40d2-b560-81d943bf383d"
  ],
  "result": null
}

The task has no assigned worker and has not made any progress.

Worker List

pulp worker list
[
  {
    "pulp_href": "/pulp/api/v3/workers/0199fe46-beb2-7a03-8cd1-3539bbc874ac/",
    "prn": "prn:core.appstatus:0199fe46-beb2-7a03-8cd1-3539bbc874ac",
    "pulp_created": "2025-10-19T21:01:13.269436Z",
    "pulp_last_updated": "2025-10-19T21:01:13.269456Z",
    "name": "1@pulp-worker.10",
    "last_heartbeat": "2025-10-20T16:12:11.370274Z",
    "versions": {
      "deb": "3.7.0",
      "rpm": "3.32.2",
      "core": "3.91.1",
      "file": "3.91.1",
      "ostree": "2.5.0",
      "certguard": "3.91.1"
    },
    "current_task": null
  },
  {
    "pulp_href": "/pulp/api/v3/workers/0199fe46-bdef-7bac-b9a9-2d66297777c1/",
    "prn": "prn:core.appstatus:0199fe46-bdef-7bac-b9a9-2d66297777c1",
    "pulp_created": "2025-10-19T21:01:13.073654Z",
    "pulp_last_updated": "2025-10-19T21:01:13.073672Z",
    "name": "1@5bdfa03b71a6",
    "last_heartbeat": "2025-10-20T16:12:11.623642Z",
    "versions": {
      "deb": "3.7.0",
      "rpm": "3.32.2",
      "core": "3.91.1",
      "file": "3.91.1",
      "ostree": "2.5.0",
      "certguard": "3.91.1"
    },
    "current_task": null
  },
  {
    "pulp_href": "/pulp/api/v3/workers/0199fe46-bccc-771b-b6d1-e0851627b35c/",
    "prn": "prn:core.appstatus:0199fe46-bccc-771b-b6d1-e0851627b35c",
    "pulp_created": "2025-10-19T21:01:12.782893Z",
    "pulp_last_updated": "2025-10-19T21:01:12.782912Z",
    "name": "1@pulp-worker.1",
    "last_heartbeat": "2025-10-20T16:12:11.540374Z",
    "versions": {
      "deb": "3.7.0",
      "rpm": "3.32.2",
      "core": "3.91.1",
      "file": "3.91.1",
      "ostree": "2.5.0",
      "certguard": "3.91.1"
    },
    "current_task": null
  }
]

Questions

  1. What typically causes a sync task to remain in the “waiting” state indefinitely after a previous failed task?
  2. Could this be due to resource locks from the failed task that were never released?
  3. What is the recommended way to safely clear or release those locks so the sync can be retried?

Thanks in advance for your help!

Can you launch a pulpcore-manager shell_plus and get the following info?

task = Task.objects.get(pk="019a0265-3074-7b7f-b506-e56ed64214b2")  # Your waiting task pk
print(task.app_lock)

If the task’s app_lock is not None and it is stuck in a “waiting” state then we have an issue of the task acquring the lock before being picked up by a worker. If it is None then I would run the same code on the failed task’s pk and see if it has the lock. If so then that means failed tasks are failing to relinquish the lock sometimes.

We can unblock safely by removing the app_lock from failed/canceled or waiting tasks:

from pulpcore.constants import TASK_STATES, TASK_WAKEUP_UNBLOCK
safe_states = (TASK_STATES.WAITING, TASK_STATES.FAILED, TASK_STATES.CANCELED)
tasks_with_locks = Task.objects.filter(state__in=safe_states, app_lock__isnull=False)
tasks_with_locks.update(app_lock=None)
wakeup_worker(TASK_WAKEUP_UNBLOCK)

It’s possible the issue is not with the app_lock and in that case we’ll need a different solution, but this is the first area I would check.

3 Likes

Thanks for the guidance. I ran the checks you suggested.

Results from pulpcore-manager shell:

task = Task.objects.get(pk="019a0265-3074-7b7f-b506-e56ed64214b2")  # waiting
print(task.app_lock)
# -> None

task = Task.objects.get(pk="0199fe46-f429-719f-b011-f79c933c43d2")  # failed (previous run)
print(task.app_lock)
# -> None

So it looks like both the waiting task and the failed task have app_lock = None.

The waiting task details still show no worker and these reserved resources:

"reserved_resources_record": [
  "prn:deb.aptrepository:01993aa7-8c75-7d4b-a862-c11f07825bad",
  "shared:prn:deb.aptremote:01994e63-f295-7d00-9ab4-e0ecf9febddd",
  "shared:prn:core.domain:b25fd239-cd6b-40d2-b560-81d943bf383d"
]

Question:
Given there’s no app_lock, is the next likely culprit stale reserved resource(s) from the failed task (or another task) blocking the new sync from being picked up? If so, what’s the recommended way to identify and safely clear those?

  • Is there a query you recommend to find tasks currently holding any of the PRNs above ?
  • Is there an approved procedure/command to force-release reserved resources when app_lock isn’t involved (similar in spirit to the app_lock=None cleanup you suggested)?

Happy to provide additional outputs (full pulp worker list, task UUIDs, etc.) if helpful. Thanks!

Is there a query you recommend to find tasks currently holding any of the PRNs above ?

You can use pulp cli to look for tasks which requires some resource:

PRN="deb.aptrepository:01993aa7-8c75-7d4b-a862-c11f07825bad"
pulp task list --reserved-resource-in $PRN  # may be provided multiple times

Is there an approved procedure/command to force-release reserved resources when app_lock isn’t involved (similar in spirit to the app_lock=None cleanup you suggested)?

Short answer, no.
The unblocking is performed by a worker and it uses an algorithm which checks the state and resources conflicts of all tasks in RUNNING, WAITING and CANCELLING state (incomplete states) at a given point in time. But it would be hhelpful to know which other tasks are using this resource, and what are their state.

1 Like

Thanks, Pedro.

I ran the suggested command multiple times to check which tasks might still be holding the resource:

PRN="deb.aptrepository:01993aa7-8c75-7d4b-a862-c11f07825bad"
pulp task list --reserved-resource-in $PRN
[]

It returns an empty list, so it doesn’t look like any active or incomplete task is holding that resource.

The “waiting” task only occurs after a previous sync task failed — and I haven’t started any new tasks since then. That makes it seem unlikely that another task is still reserving the resource.

Could this mean the reservation itself wasn’t properly released in the database even though the failed task finished? Is there a safe way to manually clear that resource record so the sync can proceed?

Sorry, I’ve missed the prn prefix. Try:

PRN="prn:deb.aptrepository:01993aa7-8c75-7d4b-a862-c11f07825bad"

Thanks, Pedro — that was helpful.

After rerunning the command with the full prn: prefix, I now get several results referencing that resource (I’ve stripped out most of the detailed fields and kept only the pulp_href and state for clarity):

PRN="prn:deb.aptrepository:01993aa7-8c75-7b7f-b506-e56ed64214b2"
pulp task list --reserved-resource-in $PRN
[
  {
    "pulp_href": "/pulp/api/v3/tasks/019a0265-3074-7b7f-b506-e56ed64214b2/",
    "state": "running"
  },
  {
    "pulp_href": "/pulp/api/v3/tasks/0199fe46-f429-719f-b011-f79c933c43d2/",
    "state": "failed"
  },
  {
    "pulp_href": "/pulp/api/v3/tasks/0199fe45-ab4b-7875-beb2-6628f836aa38/",
    "state": "canceled"
  },
  {
    "pulp_href": "/pulp/api/v3/tasks/01997253-0df9-7d7d-9d37-d277ac9af14e/",
    "state": "failed"
  },
  ...
]

The first job listed (019a0265-3074-7b7f-b506-e56ed64214b2) was originally the waiting sync task. It started automatically after I bounced the workers but then failed with the following database error:

"insert or update on table \"core_repositorycontent\" violates foreign key constraint \"core_repositoryconte_content_id_4e1dc819_fk_core_cont\"
DETAIL: Key (content_id)=(01994f3d-0974-77bc-a920-6f5128de22e2) is not present in table \"core_content\"."

After that failure, several cleanup tasks kicked in automatically and completed successfully. I’ve now re-run the sync task to see how it goes and will report back once it finishes.