Use advanced copy workflow to merge RPM repositories

Problem:
Hi, I am using the advanced copy workflow to craft a repository containing only the packages I need. Using the “Multi-repository-copy” described here, I can successfully copy RPM packages and their dependencies from multiple repositories (with cross dependencies) to multiple destination repositories.

My goal though is to copy all RPMs to a single repository. Defining the same destination repo for all source repos results in successful dependency resolution according to the solver debug log, however pulp will fail with the following error message:

pulp [e0978b46762b4f3688acb5a5b0b35a5a]: pulpcore.tasking.tasks:INFO: Starting task 018da344-2c37-71e3-9d13-46b9d9b380ce
pulp [e0978b46762b4f3688acb5a5b0b35a5a]: pulp_rpm.app.depsolving:INFO: Writing solver debug data to /var/tmp/pulp/018da344-2c37-71e3-9d13-46b9d9b380ce
pulp [e0978b46762b4f3688acb5a5b0b35a5a]: pulpcore.tasking.tasks:INFO: Task 018da344-2c37-71e3-9d13-46b9d9b380ce failed (duplicate key value violates unique constraint "core_repositoryversion_repository_id_number_3c54ce50_uniq"
DETAIL:  Key (repository_id, number)=(018da342-5e59-7cd4-ae2b-f6ae2c84c999, 1) already exists.)
pulp [e0978b46762b4f3688acb5a5b0b35a5a]: pulpcore.tasking.tasks:INFO:   File "/usr/local/lib/python3.9/site-packages/pulpcore/tasking/tasks.py", line 60, in _execute_task
    result = func(*args, **kwargs)

  File "/usr/lib64/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)

  File "/usr/local/lib/python3.9/site-packages/pulp_rpm/app/tasks/copy.py", line 235, in copy_content
    with dest_repo_version.repository.new_version(base_version=base_version) as new_version:

  File "/usr/local/lib/python3.9/site-packages/pulpcore/app/models/repository.py", line 186, in new_version
    version.save()

  File "/usr/lib64/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)

  File "/usr/local/lib/python3.9/site-packages/django_lifecycle/mixins.py", line 192, in save
    save(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 814, in save
    self.save_base(

  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 877, in save_base
    updated = self._save_table(

  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 1020, in _save_table
    results = self._do_insert(

  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 1061, in _do_insert
    return manager._insert(

  File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1805, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)

  File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1822, in execute_sql
    cursor.execute(sql, params)

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)

  File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)

  File "/usr/local/lib/python3.9/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)

I did reduce the number of workers to 1 to make sure this is not a concurrency problem.

Expected outcome:
All desired RPMs are placed into a single destination repository.

Pulpcore version:
3.45.1

Pulp plugins installed and their versions:
3.25.1

Operating system - distribution and version:
Pulp in one container

Other relevant data:
Steps to reproduce:

# add remotes
pulp rpm remote create --name="appstream_93" --url "https://dl.rockylinux.org/pub/rocky/9.3/AppStream/x86_64/os/" --policy on_demand
pulp rpm remote create --name="baseos_93" --url "https://dl.rockylinux.org/pub/rocky/9.3/BaseOS/x86_64/os/" --policy on_demand

# add source repos and sync them from remote
pulp rpm repository create --autopublish --name appstream_93 --remote appstream_93
pulp rpm distribution create --generate-repo-config --name appstream_93 --repository appstream_93 --base-path appstream_93
pulp rpm repository sync --name appstream_93

pulp rpm repository create --autopublish --name baseos_93 --remote baseos_93
pulp rpm distribution create --generate-repo-config --name baseos_93 --repository baseos_93 --base-path baseos_93
pulp rpm repository sync --name baseos_93

# create dest repo
pulp rpm repository create --name dest_repo

# get repo version hrefs for copy command
pulp rpm repository list --field name --field latest_version_href

# get package href of postfix in appstream_93 repo for copy command
pulp rpm content list --field pulp_href --field version --field release --field name --name postfix --ordering -version --ordering -release --limit 1

# copy
curl -v \
  --header "Content-Type: application/json" \
  --user "admin:foobar" \
  --data '{
    "config": [
        {"source_repo_version": "<HREF_OF_APPSTREAM_REPO_VERSION>", "dest_repo": "<HREF_OF_DEST_REPO>", "content": ["<HREF_OF_POSTFIX_RPM>"]},
        {"source_repo_version": "<HREF_OF_BASEOS_REPO_VERSION>", "dest_repo": "<HREF_OF_DEST_REPO>", "content": []}
    ],
    "dependency_solving": true
  }' \
  http://localhost:8080/pulp/api/v3/rpm/copy/

I haven’t dug terribly deeply into this, but I believe what’s happening here, is adv-copy “assumes” diff dest-repos, is creating a new repo-version for each pair as part of the same db-transaction, so both new-repo-versions in the same repo, get the same version-number. They then collide at save().

I think adv-copy needs a) more explicit documentation, and b) more validation at POST time to let you know not to do this - can you open an issue at https://github.com/pulp/pulp_rpm/issues/new?assignees=&labels=Issue%2C+Triage-Needed&projects=&template=bug_report.md&title= ?

In the meantime - what if you do your copy-to-resolve-deps into separate dest-repos, and then use adv-copy again, to copy everything from dest-repo-2 into dest-repo-1? It’s a little…clunky, but it should get you to the desired end-state.

1 Like

Thank you @ggainey for your quick response! I’ve created the github issue as you requested: Advanced copy workflow fails when same destination repo is used during multi-repository-copy · Issue #3444 · pulp/pulp_rpm · GitHub

In the meantime I also came up with the workaround that you suggested and tried it. It works as expected :+1: One thing to note is that the dependency solving needs to be disabled during the following advanced copy workflows for the merging procedure, especially when more than two repositories get merged (I’m dealing with 8 repos with cross repo dependencies).

2 Likes

OK great, glad to hear we’re on the same page! And yeah, I def should have been explicit about turning off depsolve.

“I’m dealing with 8 repos with cross repo dependencies” - oof - all my sympathies :slight_smile:

1 Like