A potential "large" Pulp setup/workload

At PulpCon 2021, we discussed the lack of setups and workloads for “large” Pulp deployments to emulate and leverage for Pulp development.

At $DAYJOB, we’re looking at Pulp to replace our rather complex homegrown setup using a mixture of tools. I would probably consider that our setup is fairly large, as it uses multiple terabytes of storage and implements a workflow for limited self-service access.

What is the system

Mirrors

  • /distro/<distroname>/<distrover>/<snapver>
    ** This is a merged repo for distributions of base + updates (e.g. Fedora release + updates)
    ** There’s a latest pointer to make it easy to hit latest snapshot
    ** Earlier snapshots are accessible by a snapshot number version (which could be an integer or a datestamp)
    ** RHEL UBI 8, CentOS Linux 7, CentOS Stream 8, Ubuntu 16.04, Ubuntu 18.04, and Ubuntu 20.04 are mirrored this way.
  • /repo/<reponame>/<distroname>-<distrover>/<snapver>
    ** This is how third-party repositories are tracked (like EPEL, OBS repositories, COPRs, etc.)
    ** Rules are the same as distro mirrors

These are maintained centrally and generally most folks can’t manipulate them, only consume them. Older snapshots would be reaped once we’ve verified there are no consumers of that content (which happens once every ~2 years).

Package and repository signatures should be preserved by default. Every sync must verify these signatures as part of content import and fail the sync if verification fails.

Team repositories

  • /team/<teamname>/<distroname>-<distroversion>/<stage>[-<snapver>]
    ** This is for internally developed packages by development teams for products. It’s effectively used as an overlay repository on the mirrored repositories.
    ** Each team repository has at least unstable and stable stages, though some may have unstable and stable-<snapver> stages, where the latter is generated programmatically from a manifest that cherry-picks package NEVRAs to generate the repository for usage in composes.

These are maintained by the teams themselves partially, where they can promote packages from unstable to stable and only automation would be able to import packages and generate stage repositories. Package removals and such are centrally managed.

Currently, I have roughly 10 of these team repositories. Packages and repositories must be signed with our key in these repositories automatically.

Workflow principles

The idea around such a setup is to clearly delineate the self-service and centrally managed parts of package repository management, giving developer teams the flexibility to properly maintain the content for their products/services while ultimately maintaining and enforcing policy centrally.

Content imports should generally happen through automation that logs and tracks the process, including signature verification, for audit purposes. Both CLI and API mechanisms need to exist for manipulating the repositories to properly handle administrative issues as well as supporting automation.

A nice to have would be a web UI so people easily browse and audit the content, but it’s not strictly needed. Maybe a Pulp Cockpit module someday if not a standalone web UI?

5 Likes

From conversation at PulpCon:

  • need RBAC implemented for pulp_rpm and pulp_deb
  • Need better doc/example for setting up/using signing-service for deb/rpm
  • Need package-signing
  • Need signature-verification at sync

See existing pulp-rpm epic/roadmap plans around verification.

@Conan_Kudo - I think I missed something, feel free to fill in anything else you can remember.

3 Likes