Pulp worker

Problem:
When a task is created, it gets the status “waiting” while there are available free workers for its execution
Expected outcome:

Pulpcore version:
2.29.7
Pulp plugins installed and their versions:
“component”: “rpm”,
“version”: “3.22.3”,
“package”: “pulp-rpm”,
“domain_compatible”: true
},
{
“component”: “file”,
“version”: “1.14.4”,
“package”: “pulp-file”,
“domain_compatible”: true
},
{
“component”: “deb”,
“version”: “3.0.0”,
“package”: “pulp_deb”,
“domain_compatible”: false
Operating system - distribution and version:
debian 11
Other relevant data:
If I restart the service on the worker the task is executed

Sep  7 16:40:05 host1 pulpcore-worker[42356]: pulp [63594d74910d451f92c846ff231c0002]: pulpcore.tasking.tasks:INFO: Task completed 018a6fc2-d77e-7559-822b-327ea5dd5d54
Sep  7 16:40:05 host1 pulpcore-worker[42355]: pulp [None]: pulpcore.tasking.pulpcore_worker:DEBUG: Worker 42355@host1 entering sleep state.

the documentation says “If no ready tasks were found a worker enters a sleep state to be notified, once new tasks are available or resources are released” but who should inform to check the tasks are not specified
https://docs.pulpproject.org/pulpcore/components.html#distributed-tasking-system

Once you restart all the workers do they continue picking up new tasks?

After the restart, the worker executes all tasks with the status “waiting” . After that, it goes into sleep mode and does not take tasks until the next restart of the service

Do you by any chance use a postgres load balancer / connection pool?

There’s a warning here:
https://docs.pulpproject.org/pulpcore/installation/instructions.html#database-setup

Specifically session advisory locks and listen/notify are needed for Pulp to operate properly.

i use patroni cluster with 2 load balancer haproxy, in this scheme it worked successfully. Yesterday I rebooted the workers and caught this problem

The question here is specifically about your postgres setup. Also can you verify your pulpcore version? 2.* seems really impossible.
Did you start at least two worker processes?

Can you run this in pulpcore-manager shell and see if workers wake up properly?

from pulpcore.app.models import Task
from pulpcore.tasking.tasks import _wakeup_worker
Task.objects.last()
_wakeup_worker()
GCC 10.2.1 20210110] on linux

Type “help”, “copyright”, “credits” or “license” for more information.
(InteractiveConsole)

from pulpcore.app.models import Task
from pulpcore.tasking.tasks import _wakeup_worker
Task.objects.last()
Task: pulp_deb.app.tasks.synchronizing.synchronize [failed]

_wakeup_worker()

Снимок экрана 2023-09-08 в 11.34.09

Did that unblock waiting tasks?
What’s the reason your last task reports for failure?

No, the task has a status waiting

I have no experience with a patroni cluster.
Any chance something updated when you restarted the workers?

Me too) strange, but it worked. I asked a colleague to create a clean database. I’ll try to install a new cluster and check if it has the same problems

1 Like

On a new cluster with a new database on the same cluster patroni ,the workers work correctly

1 Like

Thanks for the update. But I must say, this is absolutely weird. If you happen to find the difference between the databases that would be very helpful.

One more thought: Is the tasks table on the old one extremely large? Can you maybe try to run pulp task purge once in a while?

The problem was solved by taking a dump from the database and restoring from the dump

1 Like

Thank you! Although this still sounds like there can be state of the database where the workers stop working correctly. I guess we will not learn from this incident.

If I have time I’ll try to reproduce it
Offtop:
Maybe you saw this error?

log_error
Sep 19 18:16:37   pulpcore-worker[486647]: pulp [48891b9a185945b3a18e9bfffe9c56f6]: pulpcore.plugin.stages.api:DEBUG: [140370931853296] ContentAssociation - put end-marker.
Sep 19 18:16:37   pulpcore-worker[486647]: Process Process-5:
Sep 19 18:16:37   pulpcore-worker[486647]: Traceback (most recent call last):
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
Sep 19 18:16:37   pulpcore-worker[486647]:     return self.cursor.execute(sql, params)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
Sep 19 18:16:37   pulpcore-worker[486647]:     raise ex.with_traceback(None)
Sep 19 18:16:37   pulpcore-worker[486647]: psycopg.OperationalError: consuming input failed: server closed the connection unexpectedly
Sep 19 18:16:37   pulpcore-worker[486647]: #011This probably means the server terminated abnormally
Sep 19 18:16:37   pulpcore-worker[486647]: #011before or while processing the request.
Sep 19 18:16:37   pulpcore-worker[486647]: The above exception was the direct cause of the following exception:
Sep 19 18:16:37   pulpcore-worker[486647]: Traceback (most recent call last):
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/app/models/repository.py", line 1103, in __exit__
Sep 19 18:16:37   pulpcore-worker[486647]:     repository.finalize_new_version(self)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulp_deb/app/models/repository.py", line 90, in finalize_new_version
Sep 19 18:16:37   pulpcore-worker[486647]:     remove_duplicates(new_version)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/plugin/repo_version_utils.py", line 43, in remove_duplicates
Sep 19 18:16:37   pulpcore-worker[486647]:     existing_content = repository_version.previous().content
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/app/models/repository.py", line 950, in previous
Sep 19 18:16:37   pulpcore-worker[486647]:     self.repository.versions.complete()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/models/query.py", line 449, in __getitem__
Sep 19 18:16:37   pulpcore-worker[486647]:     qs._fetch_all()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/models/query.py", line 1881, in _fetch_all
Sep 19 18:16:37   pulpcore-worker[486647]:     self._result_cache = list(self._iterable_class(self))
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/models/query.py", line 91, in __iter__
Sep 19 18:16:37   pulpcore-worker[486647]:     results = compiler.execute_sql(
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1562, in execute_sql
Sep 19 18:16:37   pulpcore-worker[486647]:     cursor.execute(sql, params)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
Sep 19 18:16:37   pulpcore-worker[486647]:     return self._execute_with_wrappers(
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
Sep 19 18:16:37   pulpcore-worker[486647]:     return executor(sql, params, many, context)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
Sep 19 18:16:37   pulpcore-worker[486647]:     return self.cursor.execute(sql, params)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
Sep 19 18:16:37   pulpcore-worker[486647]:     raise dj_exc_value.with_traceback(traceback) from exc_value
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
Sep 19 18:16:37   pulpcore-worker[486647]:     return self.cursor.execute(sql, params)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
Sep 19 18:16:37   pulpcore-worker[486647]:     raise ex.with_traceback(None)
Sep 19 18:16:37   pulpcore-worker[486647]: django.db.utils.OperationalError: consuming input failed: server closed the connection unexpectedly
Sep 19 18:16:37   pulpcore-worker[486647]: #011This probably means the server terminated abnormally
Sep 19 18:16:37   pulpcore-worker[486647]: #011before or while processing the request.
Sep 19 18:16:37   pulpcore-worker[486647]: During handling of the above exception, another exception occurred:
Sep 19 18:16:37   pulpcore-worker[486647]: Traceback (most recent call last):
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 375, in _set_autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:     self.connection.autocommit = autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/connection.py", line 189, in autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:     self._set_autocommit(value)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/connection.py", line 976, in _set_autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:     self.wait(self._set_autocommit_gen(value))
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/connection.py", line 957, in wait
Sep 19 18:16:37   pulpcore-worker[486647]:     return waiting.wait(gen, self.pgconn.socket, timeout=timeout)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "psycopg_binary/pq/pgconn.pyx", line 186, in psycopg_binary.pq.PGconn.socket.__get__
Sep 19 18:16:37   pulpcore-worker[486647]: psycopg.OperationalError: the connection is lost
Sep 19 18:16:37   pulpcore-worker[486647]: The above exception was the direct cause of the following exception:
Sep 19 18:16:37   pulpcore-worker[486647]: Traceback (most recent call last):
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/tasking/tasks.py", line 65, in _execute_task
Sep 19 18:16:37   pulpcore-worker[486647]:     result = func(*args, **kwargs)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulp_deb/app/tasks/synchronizing.py", line 189, in synchronize
Sep 19 18:16:37   pulpcore-worker[486647]:     DebDeclarativeVersion(first_stage, repository, mirror=mirror).create()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/plugin/stages/declarative_version.py", line 161, in create
Sep 19 18:16:37   pulpcore-worker[486647]:     loop.run_until_complete(pipeline)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/app/models/repository.py", line 1130, in __exit__
Sep 19 18:16:37   pulpcore-worker[486647]:     self.delete()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/app/models/repository.py", line 1044, in delete
Sep 19 18:16:37   pulpcore-worker[486647]:     with transaction.atomic():
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/transaction.py", line 216, in __enter__
Sep 19 18:16:37   pulpcore-worker[486647]:     connection.set_autocommit(
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/base/base.py", line 497, in set_autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:     self._set_autocommit(autocommit)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 375, in _set_autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:     self.connection.autocommit = autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
Sep 19 18:16:37   pulpcore-worker[486647]:     raise dj_exc_value.with_traceback(traceback) from exc_value
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 375, in _set_autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:     self.connection.autocommit = autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/connection.py", line 189, in autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:     self._set_autocommit(value)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/connection.py", line 976, in _set_autocommit
Sep 19 18:16:37   pulpcore-worker[486647]:     self.wait(self._set_autocommit_gen(value))
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/connection.py", line 957, in wait
Sep 19 18:16:37   pulpcore-worker[486647]:     return waiting.wait(gen, self.pgconn.socket, timeout=timeout)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "psycopg_binary/pq/pgconn.pyx", line 186, in psycopg_binary.pq.PGconn.socket.__get__
Sep 19 18:16:37   pulpcore-worker[486647]: django.db.utils.OperationalError: the connection is lost
Sep 19 18:16:37   pulpcore-worker[486647]: During handling of the above exception, another exception occurred:
Sep 19 18:16:37   pulpcore-worker[486647]: Traceback (most recent call last):
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/base/base.py", line 308, in _cursor
Sep 19 18:16:37   pulpcore-worker[486647]:     return self._prepare_cursor(self.create_cursor(name))
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
Sep 19 18:16:37   pulpcore-worker[486647]:     return func(*args, **kwargs)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 330, in create_cursor
Sep 19 18:16:37   pulpcore-worker[486647]:     cursor = self.connection.cursor()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/connection.py", line 840, in cursor
Sep 19 18:16:37   pulpcore-worker[486647]:     self._check_connection_ok()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/connection.py", line 479, in _check_connection_ok
Sep 19 18:16:37   pulpcore-worker[486647]:     raise e.OperationalError("the connection is closed")
Sep 19 18:16:37   pulpcore-worker[486647]: psycopg.OperationalError: the connection is closed
Sep 19 18:16:37   pulpcore-worker[486647]: The above exception was the direct cause of the following exception:
Sep 19 18:16:37   pulpcore-worker[486647]: Traceback (most recent call last):
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
Sep 19 18:16:37   pulpcore-worker[486647]:     self.run()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
Sep 19 18:16:37   pulpcore-worker[486647]:     self._target(*self._args, **self._kwargs)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/tasking/pulpcore_worker.py", line 465, in _perform_task
Sep 19 18:16:37   pulpcore-worker[486647]:     execute_task(task)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/tasking/tasks.py", line 49, in execute_task
Sep 19 18:16:37   pulpcore-worker[486647]:     contextvars.copy_context().run(_execute_task, task)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/tasking/tasks.py", line 73, in _execute_task
Sep 19 18:16:37   pulpcore-worker[486647]:     task.set_failed(exc, tb)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/pulpcore/app/models/task.py", line 286, in set_failed
Sep 19 18:16:37   pulpcore-worker[486647]:     rows = Task.objects.filter(pk=self.pk, state=TASK_STATES.RUNNING).update(
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/models/query.py", line 1206, in update
Sep 19 18:16:37   pulpcore-worker[486647]:     rows = query.get_compiler(self.db).execute_sql(CURSOR)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1984, in execute_sql
Sep 19 18:16:37   pulpcore-worker[486647]:     cursor = super().execute_sql(result_type)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1560, in execute_sql
Sep 19 18:16:37   pulpcore-worker[486647]:     cursor = self.connection.cursor()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
Sep 19 18:16:37   pulpcore-worker[486647]:     return func(*args, **kwargs)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/base/base.py", line 330, in cursor
Sep 19 18:16:37   pulpcore-worker[486647]:     return self._cursor()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/base/base.py", line 308, in _cursor
Sep 19 18:16:37   pulpcore-worker[486647]:     return self._prepare_cursor(self.create_cursor(name))
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
Sep 19 18:16:37   pulpcore-worker[486647]:     raise dj_exc_value.with_traceback(traceback) from exc_value
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/base/base.py", line 308, in _cursor
Sep 19 18:16:37   pulpcore-worker[486647]:     return self._prepare_cursor(self.create_cursor(name))
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
Sep 19 18:16:37   pulpcore-worker[486647]:     return func(*args, **kwargs)
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 330, in create_cursor
Sep 19 18:16:37   pulpcore-worker[486647]:     cursor = self.connection.cursor()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/connection.py", line 840, in cursor
Sep 19 18:16:37   pulpcore-worker[486647]:     self._check_connection_ok()
Sep 19 18:16:37   pulpcore-worker[486647]:   File "/usr/local/lib/pulp/lib/python3.9/site-packages/psycopg/connection.py", line 479, in _check_connection_ok
Sep 19 18:16:37   pulpcore-worker[486647]:     raise e.OperationalError("the connection is closed")
Sep 19 18:16:37   pulpcore-worker[486647]: django.db.utils.OperationalError: the connection is closed
Sep 19 18:16:37   pulpcore-worker[478714]: pulp [None]: pulpcore.tasking.pulpcore_worker:WARNING: Task process for 018aadd3-4615-7c7b-a8c3-da8ece9e952f exited with non zero exitcode 1.
Sep 19 18:16:37   pulpcore-worker[478714]: pulp [None]: pulpcore.tasking.pulpcore_worker:INFO: Cleaning up task 018aadd3-4615-7c7b-a8c3-da8ece9e952f and marking as failed. Reason: Task process died unexpectedly with exitcode 1.

Error when syncing “bullseye main” repo.
Always falls in the same place

"message": "Associating Content",
"code": "associating.content",
"state": "completed",
"total": null,
"done": **119799**,
"suffix": null
1 Like

Thank you again.
Seems like you have a setup for interesting database issues. All I see is that the connection is lost at some point. I’m not sure what Pulp can do to prevent that. But in some weird way, that may even be a common root cause for the other issue (on a broken connection notify cannot work either…).

It seems that the problem was solved by increasing the session time on the proxy from 35 minutes to 120. This is the only change that has happened recently

Orphan question: Due to a database issue, the repository could not sync successfully, but the artifacts and content were created. How can I clean it properly? By using
pulp orphan cleanup
results:

  "message": "Clean up orphan Artifacts",
  "code": "clean-up.artifacts",
  "state": "completed",
  "total": 0,
  "done": 0,
  "suffix": null
}
  ],
  "created_resources": [],
  "reserved_resources_record": [
"/api/v3/orphans/cleanup/"

Before performing the cleanup, I deleted all repositories

Understood. need to successfully sync, delete the repository, then run the cleanup