Deadlock

Problem:
Deadlocks during multiple syncs. Will the deadlock resolve itself or I still need to kill the tasks which caused the deadlock.

Expected outcome:

Pulpcore version:
3.22.4
Pulp plugins installed and their versions:
ulp-ansible 0.16.1
pulp-cli 0.19.0
pulp-deb 2.20.2
pulp-file 1.12.0
pulp-glue 0.19.0
pulp-rpm 3.19.3
pulp-rpm-client 3.19.3
pulpcore 3.22.4
pulpcore-client 3.22.4

Operating system - distribution and version:
rel 8.6

Other relevant data:
“traceback”: " File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/tasking/pulpcore_worker.py”, line 444, in _perform_task\n result = func(*args, **kwargs)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulp_rpm/app/tasks/synchronizing.py”, line 567, in synchronize\n repo_version = dv.create() or repo.latest_version()\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/plugin/stages/declarative_version.py”, line 161, in create\n loop.run_until_complete(pipeline)\n File “/usr/lib64/python3.9/asyncio/base_events.py”, line 642, in run_until_complete\n return future.result()\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/plugin/stages/api.py”, line 225, in create_pipeline\n await asyncio.gather(*futures)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/plugin/stages/api.py”, line 43, in call\n await self.run()\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulp_rpm/app/tasks/synchronizing.py”, line 1627, in run\n await sync_to_async(result.save)()\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/asgiref/sync.py”, line 448, in call\n ret = await asyncio.wait_for(future, timeout=None)\n File “/usr/lib64/python3.9/asyncio/tasks.py”, line 442, in wait_for\n return await fut\n File “/usr/lib64/python3.9/concurrent/futures/thread.py”, line 52, in run\n result = self.fn(*self.args, **self.kwargs)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/asgiref/sync.py”, line 490, in thread_handler\n return func(*args, **kwargs)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/app/models/base.py”, line 203, in save\n return super().save(*args, **kwargs)\n File “/usr/lib64/python3.9/contextlib.py”, line 79, in inner\n return func(*args, **kwds)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django_lifecycle/mixins.py”, line 169, in save\n save(*args, **kwargs)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/base.py”, line 739, in save\n self.save_base(using=using, force_insert=force_insert,\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/base.py”, line 776, in save_base\n updated = self._save_table(\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/base.py”, line 858, in _save_table\n updated = self._do_update(base_qs, using, pk_val, values, update_fields,\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/base.py”, line 912, in _do_update\n return filtered._update(values) > 0\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/query.py”, line 802, in _update\n return query.get_compiler(self.db).execute_sql(CURSOR)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/sql/compiler.py”, line 1559, in execute_sql\n cursor = super().execute_sql(result_type)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/sql/compiler.py”, line 1175, in execute_sql\n cursor.execute(sql, params)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/utils.py”, line 66, in execute\n return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/utils.py”, line 75, in _execute_with_wrappers\n return executor(sql, params, many, context)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/utils.py”, line 84, in _execute\n return self.cursor.execute(sql, params)\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/utils.py”, line 90, in exit\n raise dj_exc_value.with_traceback(traceback) from exc_value\n File “/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/utils.py”, line 84, in _execute\n return self.cursor.execute(sql, params)\n",
“description”: “deadlock detected\nDETAIL: Process 36002 waits for ShareLock on transaction 48531899; blocked by process 36857.\nProcess 36857 waits for ShareLock on transaction 48531903; blocked by process 36002.\nHINT: See server log for query details.\nCONTEXT: while locking tuple (21967,1) in relation “rpm_package”\n”
},

Hi bli111!

I’m working on closing a deadlock-window when syncing repos with lots of overlapping content, but don’t have it quite shut yet. I haven’t had a chance to dig into your stacktrace, so I don’t know if it’s the same problem.

You can work around the issue by cutting the number of workers to one, or by separating your syncs out such that repos-with-overlapping-content aren’t sync’d at the same time. Painful, I know :frowning:

When postgres detects a deadlock, it chooses one “side” and kills that transaction. That percolates up to the running task, which logs the error and fails. You won’t have to kill/cancel tasks by hand, postgres does this for you/us.

Do you mean one pulpcore worker totally or one for each pulp server? Currently we have 4 servers and each server is configured with 3 pulpcore workers to accommodate large amount of repo syncs.
I noticed a few of repos with overlapping content stuck in running state forever. I had to kill them manually. We are planning to run sync everyday. I am thinking to monitor the start time of running sync tasks. if it runs too long, I will need to kill them automatically. Is this a good idea?

Help me understand your configuration a little better. Do you have

  • 4 completely-separate Pulp instances?
  • 4 separate Pulp instances, that share (only) a filesystem?
  • 4 nodes, that share a filesystem and all talk to the same Postgres?

We haven’t seen sync-tasks stuck in ‘running’, feels like something new is going on. Noticing them, canceling, and re-running should be safe.

@dralley does any of this ring a bell with you?

We have 4 nodes which share a S3, the same external postgres and the same external redis. All nodes have api, content server installed. We are using a load balancer frontend to distribute the load among them.