Sync galaxy collection died unexpectedly

Problem:

I got an error when sync galaxy collection with pulp-ansible

.....................................................................................................................................................
Error: Task /pulp/api/v3/tasks/0cc477a8-a57b-4cd6-a51e-a1841f2b076d/ failed: 'Task process died unexpectedly with exitcode 1.

Expected outcome:
Expect a successful status.

Pulpcore version:
3.21.3

Pulp plugins installed and their versions:
pulp-ansible 0.16.1
pulp-cli 0.16.0
pulp-deb 2.20.1
pulp-file 1.11.3
pulp-rpm 3.19.3
pulp-rpm-client 3.18.9
pulpcore 3.21.3
pulpcore-client 3.21.3

Operating system - distribution and version:
Rhel 8.6

Other relevant data:
The process run more than 2 hours and died at the end of sync. All sub tasks were completed but the task itself failed

{
  "pulp_href": "/pulp/api/v3/tasks/0cc477a8-a57b-4cd6-a51e-a1841f2b076d/",
  "pulp_created": "2023-04-05T14:32:16.524436Z",
  "state": "failed",
  "name": "pulp_ansible.app.tasks.collections.sync",
  "logging_cid": "c183b21472544310b54dde65bebc2023",
  "started_at": "2023-04-05T14:32:16.760135Z",
  "finished_at": "2023-04-05T16:45:53.467254Z",
  "error": {
    "reason": "Task process died unexpectedly with exitcode 1."
  },
  "worker": "/pulp/api/v3/workers/79f18cf0-5cc3-4779-a921-c2d478a0775b/",
  "parent_task": null,
  "child_tasks": [],
  "task_group": null,
  "progress_reports": [
    {
      "message": "no_change: Checking if remote changed since last sync.",
      "code": "sync.no_change",
      "state": "completed",
      "total": null,
      "done": 0,
      "suffix": null
    },
    {
      "message": "Parsing CollectionVersion Metadata",
      "code": "sync.parsing.metadata",
      "state": "completed",
      "total": null,
      "done": 21833,
      "suffix": null
    },
    {
      "message": "Downloading Artifacts",
      "code": "sync.downloading.artifacts",
      "state": "completed",
      "total": null,
      "done": 99,
      "suffix": null
    },
    {
      "message": "Downloading Artifacts",
      "code": "sync.downloading.artifacts",
      "state": "completed",
      "total": null,
      "done": 0,
      "suffix": null
    },
    {
      "message": "Associating Content",
      "code": "associating.content",
      "state": "completed",
      "total": null,
      "done": 22267,
      "suffix": null
    }
  ],
  "created_resources": [],
  "reserved_resources_record": [
    "/pulp/api/v3/repositories/ansible/ansible/6196227c-3e64-4b33-be97-8d9dfa45bb82/",
    "shared:/pulp/api/v3/remotes/ansible/collection/a971dfa5-0d07-4043-93c6-84f5d383e18e/"
  ]
}

I created the repo and remote as below

pulp ansible remote -t "collection" create \
    --name "collections" \
    --proxy-url=http://myproxy \
    --url "https://galaxy.ansible.com/"
    # --requirements "collections:\n  - testing.ansible_testing_content"
pulp ansible repository create --name collections --remote "collection:collections"
pulp ansible distribution create --name "collections" --base-path "collections" --repository "collections"
pulp ansible repository sync --name collections

I run the sync multiple times and failed every time.

Can you find any trace of this event in the server logs?

I don’t see any stack trace and errors in the /var/log/messages and stdout. Should I look for the log anywhere else?

That depends on the way your system is deployed.
Maybe run journalctl -f while doing the sync?

journalctl did the trick. I got

Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: Process Process-38:
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: Traceback (most recent call last):
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return self.cursor.execute(sql, params)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: psycopg2.OperationalError: SSL SYSCALL error: EOF detected
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: The above exception was the direct cause of the following exception:
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: Traceback (most recent call last):
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/app/models/repository.py", line 1065, in __exit__
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     repository.finalize_new_version(self)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulp_ansible/app/models.py", line 382, in finalize_new_version
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     for version in removed_collection_versions:
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/query.py", line 280, in __iter__
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     self._fetch_all()
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/query.py", line 1324, in _fetch_all
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     self._result_cache = list(self._iterable_class(self))
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/query.py", line 51, in __iter__
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     cursor.execute(sql, params)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return executor(sql, params, many, context)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return self.cursor.execute(sql, params)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     raise dj_exc_value.with_traceback(traceback) from exc_value
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return self.cursor.execute(sql, params)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: django.db.utils.OperationalError: SSL SYSCALL error: EOF detected
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: During handling of the above exception, another exception occurred:
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: Traceback (most recent call last):
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 277, in _set_autocommit
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     self.connection.autocommit = autocommit
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: psycopg2.InterfaceError: connection already closed
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: The above exception was the direct cause of the following exception:
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: Traceback (most recent call last):
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/tasking/pulpcore_worker.py", line 452, in _perform_task
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     result = func(*args, **kwargs)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulp_ansible/app/tasks/collections.py", line 188, in sync
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     repo_version = d_version.create()
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/plugin/stages/declarative_version.py", line 161, in create
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     loop.run_until_complete(pipeline)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/app/models/repository.py", line 1092, in __exit__
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     self.delete()
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/app/models/repository.py", line 1006, in delete
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     with transaction.atomic():
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/transaction.py", line 207, in __enter__
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     connection.set_autocommit(False, force_begin_transaction_with_broken_autocommit=True)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 415, in set_autocommit
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     self._set_autocommit(autocommit)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 277, in _set_autocommit
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     self.connection.autocommit = autocommit
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     raise dj_exc_value.with_traceback(traceback) from exc_value
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 277, in _set_autocommit
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     self.connection.autocommit = autocommit
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: django.db.utils.InterfaceError: connection already closed
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: During handling of the above exception, another exception occurred:
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: Traceback (most recent call last):
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return self._prepare_cursor(self.create_cursor(name))
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return func(*args, **kwargs)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     cursor = self.connection.cursor()
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: psycopg2.InterfaceError: connection already closed
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: The above exception was the direct cause of the following exception:
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: Traceback (most recent call last):
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     self.run()
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     self._target(*self._args, **self._kwargs)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/tasking/pulpcore_worker.py", line 460, in _perform_task
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     task.set_failed(exc, tb)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/pulpcore/app/models/task.py", line 277, in set_failed
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     Task.objects.filter(pk=self.pk)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/query.py", line 783, in update
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     rows = query.get_compiler(self.db).execute_sql(CURSOR)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1559, in execute_sql
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     cursor = super().execute_sql(result_type)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     cursor = self.connection.cursor()
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return func(*args, **kwargs)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 259, in cursor
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return self._cursor()
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return self._prepare_cursor(self.create_cursor(name))
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     raise dj_exc_value.with_traceback(traceback) from exc_value
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 237, in _cursor
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return self._prepare_cursor(self.create_cursor(name))
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     return func(*args, **kwargs)
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]:     cursor = self.connection.cursor()
Apr 06 11:46:35 pulpd-ob-683 pulpcore-worker[77127]: django.db.utils.InterfaceError: connection already closed

This looks like the DB closed the connection from its side in the middle of a transaction. Is there anything Exciting in the postgres logs?

I’m not sure if the text

is reversing causality…
What seems to happen is that pulp_ansible while trying to loop over removed_collection_versions is hitting django.db.utils.OperationalError: SSL SYSCALL error: EOF detected. And all db access after that is finding a closed connection.

Checked the posgresql log. Most of them are connection received, disconnection messages. There is a single message entry which could cause the issue but not sure why this happened.
could not receive data from client: Connection reset by peer

It looks the postgresql suddenly got disconnected at the end of sync

Apr 10 18:07:12 pulpd-ob-111 pulpcore-worker[75210]: pulp [e9727a55718349ce83f4dd45666880c7]: pulp_ansible.app.tasks.utils:INFO: Reading MANIFEST.json from artifact/5c/82ea55ca>
Apr 10 18:07:12 pulpd-ob-111 pulpcore-worker[75210]: pulp [e9727a55718349ce83f4dd45666880c7]: pulp_ansible.app.tasks.utils:INFO: Reading FILES.json from artifact/5c/82ea55ca104>
Apr 10 18:07:12 pulpd-ob-111 pulpcore-worker[75210]: Process Process-1:
Apr 10 18:07:13 pulpd-ob-111 pulpcore-worker[75210]: Traceback (most recent call last):
Apr 10 18:07:13 pulpd-ob-111 pulpcore-worker[75210]:   File "/opt/utils/venv/pulp/3.9.7/lib64/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
Apr 10 18:07:13 pulpd-ob-111 pulpcore-worker[75210]:     return self.cursor.execute(sql, params)
Apr 10 18:07:13 pulpd-ob-111 pulpcore-worker[75210]: psycopg2.OperationalError: SSL SYSCALL error: EOF detected

This msg could not receive data from client: Connection reset by peer seems to be that the client dropped the connection without properly closing the database. I wonder if this could cause by mismatch version of server and client. The postgresql is an external server with version 13.7. On the pulp server, the postgresql client is on 10.19.

I updated the postgresql client to 13.7 and am still seeing the issue.