Concurrency, ordering, and Fun With Deadlocks

Hey folks,

I’ve been investigating Issue #8750: Deadlock on rpm repository pulp2pulp sync - Pulp, to understand what’s going on and how to address it. The short-term good news is, I cannot recreate the problem on current-master for pulpcore/pulp_rpm; I have to go back to core/3.11-rpm/3.10 for my reproducer to reproduce.

However, my suspicion is that that’s just luck. The core of the problem, is that Pulp does bulk operations (e.g., bulk_create()) on non-sorted sequences of entities. What that can lead to (and in this case, does) is deadlock if two threads are trying to create/update overlapping sets of entities, in different orders in one transaction. Thread-1 wants to update, say, RPMs 1, 2 and 3, while Thread-2 wants 3, 1, 2, and Deadlock Happens.

The root cause here is that neither Postgres nor Django define an implicit ordering. If your queries don’t have an explicit order-by clause, you get results back in whatever order is most efficient/performant/easiest for the underlying tech.

I have a POC based on core/3.11-rpm/3.10 that addresses the problem - at least, as far as I have been able to force it to happen. I’ll attach the diffs below.

The problem is, this POC addresses specific code-paths, by enforcing an ordering on lists-of-entities individually. This is at best ugly, and at worst just playing whack-a-mole with problems as they come up.

Having been in similar straits before, I know that trying to fix the specific errors as deadlocks crop up is…inefficient. And nobody ends up happy, not the users who find the next mole, not the devs that have to answer Yet Another Deadlock Issue, not the maintainers 6 months from now wondering why ‘sort’ is the single-most-used function in the codebase. Ew.

I’ve seen this problem handled in a few different ways, in past lives.

First: some kind of “god lock” that has to be acquired by one of these ‘dangerous’ ops. This is easy to implement, it’s easy to miss code that needs to call it, and it’s a disaster for concurrency. Two thumbs down, would not use again.

Second: explicit table/model locks, again acquired by the “dangerous” code paths - each bulk-operation would lock the tables/models it was about to work with in alpha-order. This solves the deadlock problem, might be marginally better for concurrency than one-lock, but has all the disads of the above.

Third: every model requires an explicit default ordering. Django allows this with a Meta tag “ordering”. It would solve the problem (at least for django-queries, more on that in a bit), with two downsides:

  • every Model would need to define or inherit it
  • it’s a performance impact on every query - even ones that are “read only” and don’t care about order. I don’t have any idea how large the performance impact might be.

This also doesn’t handle non-django-query lists (like, say, “the list of all packages in an RPM repository’s primary.xml”)

So - where do we go from here? I think we have some time to discuss, as the new tasking system seems to change the timing enough that we don’t hit this frequently, even when we’re really trying. I would love it if someone who’s done more/better research than I would be able to say “no dummy, there’s some great new research into this area and Smart People are solving it this way!” I would owe such a person All The Beer.

But the net is (for me at least), Pulp has to get on top of this problem. Large concurrent systems talking to one DB about similar entities at scale, without concerning themselves with preventing deadlocks, are just accidents waiting to happen.
Thoughts/suggestions/discussion gratefully accepted.

G

Here are my patchs for pulpcore and pulp_rpm that ‘solve’ the current-reproducers under core/3.11-rpm/3.10, for those who are interested:

(1974482_sync_deadlocks) ~/github/Pulp3/pulpcore $ git diff
diff --git a/pulpcore/plugin/stages/artifact_stages.py b/pulpcore/plugin/stages/artifact_stages.py
index 570446c2e..0865eefd9 100644
--- a/pulpcore/plugin/stages/artifact_stages.py
+++ b/pulpcore/plugin/stages/artifact_stages.py
@@ -319,6 +319,7 @@ class RemoteArtifactSaver(Stage):
                     else:
                         remote_artifact = self._create_remote_artifact(d_artifact, content_artifact)
                         needed_ras.append(remote_artifact)
+        needed_ras.sort(key=lambda x: x.sha256)
         return needed_ras
 
     @staticmethod
diff --git a/pulpcore/plugin/stages/content_stages.py b/pulpcore/plugin/stages/content_stages.py
index 9ef25199a..3a1603377 100644
--- a/pulpcore/plugin/stages/content_stages.py
+++ b/pulpcore/plugin/stages/content_stages.py
@@ -85,7 +85,7 @@ class ContentSaver(Stage):
             content_artifact_bulk = []
             with transaction.atomic():
                 await self._pre_save(batch)
-
+                batch.sort(key=lambda x: x.content.natural_key())
                 for d_content in batch:
                     # Are we saving to the database for the first time?
                     content_already_saved = not d_content.content._state.adding
(1974482_sync_deadlocks) ~/github/Pulp3/pulpcore $ cd ../pulp_rpm
(1974482_sync_deadlocks) ~/github/Pulp3/pulp_rpm $ git diff
diff --git a/pulp_rpm/app/tasks/synchronizing.py b/pulp_rpm/app/tasks/synchronizing.py
index 3b3821c3..3498e495 100644
--- a/pulp_rpm/app/tasks/synchronizing.py
+++ b/pulp_rpm/app/tasks/synchronizing.py
@@ -570,15 +570,15 @@ class RpmFirstStage(Stage):
         await self._parse_advisories(updates)
 
     async def _parse_packages(self, packages):
+        pitems = packages.items()
         progress_data = {
             "message": "Parsed Packages",
             "code": "parsing.packages",
             "total": len(packages),
         }
-
         with ProgressReport(**progress_data) as packages_pb:
-            for pkg in packages.values():
-                package = Package(**Package.createrepo_to_dict(pkg))
+            for pkg in sorted(pitems, key=lambda x: x[0]):
+                package = Package(**Package.createrepo_to_dict(pkg[1]))
                 artifact = Artifact(size=package.size_package)
                 checksum_type = getattr(CHECKSUM_TYPES, package.checksum_type.upper())
                 setattr(artifact, checksum_type, package.pkgId)
@@ -615,6 +615,7 @@ class RpmFirstStage(Stage):
             "code": "parsing.advisories",
             "total": len(updates),
         }
+        updates.sort(key=lambda x: x.id)
         with ProgressReport(**progress_data) as advisories_pb:
             for update in updates:
                 update_record = UpdateRecord(**UpdateRecord.createrepo_to_dict(update))
(1974482_sync_deadlocks) ~/github/Pulp3/pulp_rpm $ 
1 Like

Do these deadlocks always happen in this codepath?

bulk_get_or_create() is the primary offender, but it’s not the only path. Also, trying to sort inside bgoc() is difficult - by the time you’re inside the call, you have abstract objects, and sorting correctly often requires knowing what you should be sorting-on.

Would have been too easy.
I was suspecting that the main problem could be our recovery code for the case that not all items were actually new, and we break down to individual SELECT or INSERT commands in one transaction.
Couldn’t you always sort on UUID?

The problem with sorting on UUID, is that objects that already exist in the DB come in with a ‘new’ uuid (I think?) because we don’t know they’re there yet. So you can end up with two threads sorting the ‘same’ objects differently, and deadlocking on the updates. At least, that was my thought-process on this…

You are right. Different processes can never agree on newly generated UUID’s pre-save.

Please try to reproduce using pulpcore 3.15. The changes to transactional code in 3.15 should provide improvements.

1 Like