I’d like to discuss whether Publication in Pulp terms can be mapped on Composing action as in Fedora, CentOS, RHEL terminology.
I am looking for possibilities to
- decouple the compose process from the rpm build system,
- have compose-related artifacts organized in a meaningful way, so that one can use higher-level methods for discovery, search, diff, rebase and merge them, rather than work with the hardcoded directory structure.
But let me set the scene first.
Context
How Pulp Publication works (as far as i understood)
There is a certain set of content units. We “put”(link) content units into a Repository.
Then we create a Repository Version = immutable snapshot of the state of the Repository at a certain point in time.
The we apply Publication to it.
Publication is a procedure which takes Repository Version as input and generates additional artifacts from it (it can be rpm metadata, directory hierarchy, html pages,…).
This artifacts are then stored in Pulp as separate Content Units(?)
And then we may expose the Publication via Distribution object.
How Fedora/CentOS/RHEL compose works
We have pool of rpm builds (rpm build is a group of several rpms built from a single SRPM, think sub packages and arch-dependent packages) represented by a Koji tag.
We make a snapshot of this pool at a certain time (tracked by koji event object) and then we run a compose.
Compose has multiple phases:
- It takes the pool of rpm builds and splits packages into arch-specific repositories.
- It takes the repository of all packages for this architecture and then applies filtering and dependency resolution to produce the layered repository structure, so that the next layer depends only on the previous one.
- It then triggers image builds and container builds which produce iso, qcow, … artifacts
- It uploads those artifacts back to Koji Hub
- And then there is a completely separate procedure to take data from Koji Hub and publish it to mirrors for distribution.
Ideas
1. Compose as Publication
- RPM build is a Content Unit
- Koji tag is a Repository
- Koji tag per event is a Repository Version
- Compose is a Publication
The concern here is that Publication becomes a very heavy step - it generates metadata, it also calls for external services to run image builds and it needs to own new types of artifacts.
Question: Can it really do that? Are there limitations to what Publication can do and which content units it can contain? Can it run several tasks in parallel?
2 Repository groups? Links?
It can also be that Publication only covers the first two phases of pungi: generate repositories structure. While building of the secondary artifacts would be orchestrated by the external service.
This then leads to the other question see https://github.com/pulp/pulpcore/issues/3710
When I am producing the secondary artifacts from a certain publication, I want to preserve the link between those artifacts and the Repository Version and Publication which I used as inputs.
So while I can setup a separate dedicated Repository to store the compose artifacts, I lose the “native” connection between them, and I will have to add it via some custom ways (metadata files?).
So if Compose doesn’t work in the Publication context, can it get the relevant abstraction on its own?
Composing as a way to produce a linked repository
I think it can be considered a common pattern in CI that we have two different kinds of artifacts:
- Primary artifacts - (pool of rpm builds in the Fedora case)
- Secondary or Derivative artifacts - things which we produce from the primaries (rpm repos and images in our example).
The repository of primary artifacts represents the place, where the change happens: we upload new content units to it directly.
The derivative artifacts are produced(composed) from snapshots of primary artifacts.
The key issue is that when I am validating the update of the primary artifact and decide whether or not I would like to promote it, I need to use derivative artifacts to make a decision.
Therefore it is critical to maintain the two-directional link between a derivative artifact and the primary one.
Compose object could be a new abstraction which represents the “Publication on steroids”, which has a proper Repository object associated to it. And which maintains the link between a Primary Repository Version and the derivative Repository Version.
What do you think?