Pulp background sync

Problem:
This is not a problem per say, but wanted to hear ideas and feedback to understand how we can improve sync time in pulp.
we have a large number of RHEL/CentOS/OEL Repos to sync and while exploring the CLI options in pulp3, we noticed a --background option.
While this can be used to successfully sync the repos in background, wanted to know how we can use this feature in a bash script where we can monitor the progress of the the tasks in background and after its completed, we can run commands to create rpm publications and rpm distributions.
Have you come across such a requirement and if yes, how have you scripted it?

Expected outcome:
Looking for feedback or snippets of code to help achieve sync and monitor of the sync in background

Pulpcore version:
3.14

Pulp plugins installed and their versions:
pulp_rpm

Operating system - distribution and version:
RHEL7
Other relevant data:

Now that you are asking for it, i realize while possible, this is a bit cumbersome.
If you specify --background on a cli operation, the cli will not wait for its completion. So far so good. It also prints a message about the started task (independently of the --background option) on stderr. So you’d need to capture that error stream and use sed to extract the task url. Once that is accomplished, you can kick off more tasks in the background, collecting their pulp_href. Then you can use pulp task show --wait --href "$TASK_HREF" on each of them.
OTOH, you can also not use the background option, but bash job control to start multiple cli commands simultaneously.

I know, this is not directly translatable to a usual shell script, but you should get the idea: https://github.com/pulp/pulp-cli/blob/main/tests/scripts/pulpcore/test_task.sh#L38

2 Likes

Awesome,
Thanks for this tip, will see how far I can reach with this…

1 Like

Are you aware of auto-publish and auto-distribute?

https://docs.pulpproject.org/pulp_file/workflows/publish-host.html#automate-publication-and-distribution

This would prevent you from having to wait on the sync task to perform publish/distribute.

1 Like

yup, was aware of this…
however, what I wanted to achieve is to start of 5-6 repository sync in parallel, using our CICD tool, and then wait for the tasks to complete…
doing one repo at a time, is painfully slow and wanted some ways to parallelise the process…

Thanks…

2 Likes

There’s no easy way to do that purely from the CLI, you’d need to do a little bit of extra scripting around it (either with bash around the CLI or python + the raw http bindings) to get the task IDs and wait until they’re all completed.

Just a thought (I know really old thread).

In your CICD, can you not just have one job/pipeline per repository, and have them all kick off at the same time?

1 Like