Repair failed on 3.45.1

Problem:
/pulp/api/v3/repair/ failed but all subtask are completed

Expected outcome:
expect /repair/ to complete

Pulpcore version:
3.45.1

Pulp plugins installed and their versions:
“deb”: “3.1.1”,
“rpm”: “3.25.0”,
“core”: “3.45.1”,
“file”: “3.45.1”,
“maven”: “0.8.0”,
“ostree”: “2.2.1”,
“python”: “3.11.0”,
“ansible”: “0.21.1”,
“certguard”: “3.45.1”,
“container”: “2.18.0”

Operating system - distribution and version:
Rhel 8.6

Other relevant data:

/api/v3/repair failed with exitcode 1. Rerun repair multiple times and have the same error below. Does this indicate any data corruption or this can be safely ignored?

Process Process-4:
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: Traceback (most recent call last):
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/django/db/backe
nds/utils.py”, line 89, in _execute
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: return self.cursor.execute(sql, params)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/psycopg/cursor.
py”, line 732, in execute
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: raise ex.with_traceback(None)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: psycopg.OperationalError: consuming input failed: EOF detected
Feb 23 08:18:16 pulpd-master pulp-worker[97160]:
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: The above exception was the direct cause of the following exce
ption:
Feb 23 08:18:16 pulpd-master pulp-worker[97160]:
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: Traceback (most recent call last):
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/lib64/python3.8/multiprocessing/process.py”, line
315, in _bootstrap
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: self.run()
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/lib64/python3.8/multiprocessing/process.py”, line
108, in run
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: self._target(*self._args, **self._kwargs)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/pulpcore/taskin
g/_util.py”, line 151, in perform_task
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: execute_task(task)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/pulpcore/tasking/tasks.py”, line 44, in execute_task
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: contextvars.copy_context().run(_execute_task, task)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/pulpcore/tasking/tasks.py”, line 72, in _execute_task
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: task.set_completed()
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/pulpcore/app/models/task.py”, line 169, in set_completed
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: rows = Task.objects.filter(pk=self.pk, state=TASK_STATES.RUNNING).update(
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/django/db/models/query.py”, line 1206, in update
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: rows = query.get_compiler(self.db).execute_sql(CURSOR)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py”, line 1984, in execute_sql
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: cursor = super().execute_sql(result_type)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py”, line 1562, in execute_sql
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: cursor.execute(sql, params)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py”, line 67, in execute
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: return self._execute_with_wrappers(
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py”, line 80, in _execute_with_wrappers
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: return executor(sql, params, many, context)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py”, line 89, in _execute
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: return self.cursor.execute(sql, params)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/django/db/utils.py”, line 91, in exit
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: raise dj_exc_value.with_traceback(traceback) from exc_value
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py”, line 89, in _execute
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: return self.cursor.execute(sql, params)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: File “/usr/local/lib/python3.8/site-packages/psycopg/cursor.py”, line 732, in execute
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: raise ex.with_traceback(None)
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: django.db.utils.OperationalError: consuming input failed: EOF detected
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: pulp [None]: pulpcore.tasking.worker:WARNING: Task process for 018dd475-3421-7884-9874-bb3bd39b6937 exited with non zero
exitcode 1.
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: pulp [None]: pulpcore.tasking.worker:INFO: Cleaning up task 018dd475-3421-7884-9874-bb3bd39b6937 and marking as failed.
Reason: Task process died unexpectedly with exitcode 1.

Feb 23 08:18:16 pulpd-master pulp-worker[97160]: pulp [None]: pulpcore.tasking.worker:WARNING: Task process for 018dd475-3421-7884-9874-bb3bd39b6937 exited with non zero exitcode 1.
Feb 23 08:18:16 pulpd-master pulp-worker[97160]: pulp [None]: pulpcore.tasking.worker:INFO: Cleaning up task 018dd475-3421-7884-9874-bb3bd39b6937 and marking as failed. Reason: Task process died unexpectedly with exitcode 1.

The following is repair task status

{
“pulp_href”: “/pulp/api/v3/tasks/018dd475-3421-7884-9874-bb3bd39b6937/”,
“pulp_created”: “2024-02-23T05:34:27.367452Z”,
“state”: “failed”,
“name”: “pulpcore.app.tasks.repository.repair_all_artifacts”,
“logging_cid”: “24835e350cea4d98b3fca015ecddfcf1”,
“created_by”: “/pulp/api/v3/users/1/”,
“started_at”: “2024-02-23T05:34:27.602164Z”,
“finished_at”: “2024-02-23T13:18:16.515942Z”,
“error”: {
“reason”: “Task process died unexpectedly with exitcode 1.”
},
“worker”: “/pulp/api/v3/workers/018dd13d-c8f1-7d4b-9deb-55ae0f43e90e/”,
“parent_task”: null,
“child_tasks”: [],
“task_group”: null,
“progress_reports”: [
{
“message”: “Identify missing units”,
“code”: “repair.missing”,
“state”: “completed”,
“total”: null,
“done”: 1,
“suffix”: null
},
{
“message”: “Identify corrupted units”,
“code”: “repair.corrupted”,
“state”: “completed”,
“total”: null,
“done”: 0,
“suffix”: null
},
{
“message”: “Repair corrupted units”,
“code”: “repair.repaired”,
“state”: “completed”,
“total”: null,
“done”: 0,
“suffix”: null
}
],
“created_resources”: [],
“reserved_resources_record”: [
“/api/v3/repair/”,
“shared:/pulp/api/v3/domains/018d40fb-edef-75da-a8ae-2c1f61d3485a/”
]
}

I don’t think we’ve ever seen this - @ipanova , any input here?

@bli111 - this looks like postgres cut the connection. Are there any errors in the postgres logs, from around this timestamp?

It fails on execution of this line Task.objects.filter(pk=self.pk, state=TASK_STATES.RUNNING).update( state=TASK_STATES.COMPLETED, finished_at=timezone.now() )
@bli111 you can probably ignore this because this is the step when task state is being moved to completed meaning that all the task work has already been ‘done’ but still, we should understand what happened here