Mirror Docker Hub with Pulp 3

Problem:
Looking to mirror Pulp 3 with Docker Hub. My current Pulp 3 setup is running on a EKS cluster with S3 backend. Pulp on EKS

Expected outcome:
Any Image can be pulled from Pulp which is hosted on Docker Hub.

Operating system - distribution and version:
EKS Cluster

Please help me with it. TIA.

It is not obvious to me what you are looking for. Maybe it is this: https://github.com/pulp/pulp_container/issues/507

1 Like

Hi @x9c4 thank you for picking up the ticket. With your help I could install Pulp on EKS cluster. Now I want to mirror Docker Hub to Pulp so that users can download images from Pulp instead of downloading it from Docker Hub. Later we’ll do image scanning as well before making image available for our users. It will be an Enterprise solution.

For mirroring, you can follow this documentation: Synchronize a Repository — Pulp Container Support 2.15.2 documentation. First, you begin with creating a repository (a local Pulp repository used for administration) and remote (a reference to a remote Docker Hub registry). Then, you sync the content from the remote to the repository.

Afterwards, you need to create a distribution and distribute the synced repository: Host and Consume a Repository — Pulp Container Support 2.15.2 documentation. The distribution is an entity that tells Pulp where to serve content from.

You can use the CLI when working with Pulp: Pulp CLI.

pulp container repository create --name library/busybox
pulp container remote create --name library/busybox --url https://registry-1.docker.io --upstream-name library/busybox
pulp container repository sync --name library/busybox --remote library/busybox
pulp container distribution create --name library/busybox --repository library/busybox
2 Likes

Currently, we do not support mirroring the whole Docker Hub registry. You need to create a repository/remote/distribution triplet for each remote repository on Docker Hub. This can be easily scripted in a for loop along with listing all repositories on Docker Hub (https://stackoverflow.com/a/61422885).

Soon, we are enabling the so-called pull-through caching, which is the issue @x9c4 referenced before.

2 Likes

Got you. Thank you @lubosmj I will give it a try. As you said mirroring of whole Docker Hub Registry is not supported for now. Is it same for PyPi as well? Do I need to create a remote repository for each PyPi package?
https://docs.pulpproject.org/pulp_python/workflows/pypi.html#setup-your-own-pypi

No, you do not need to perform the same for pulp_python. You can freely sync the whole PyPI instance: Synchronize a Repository — Pulp python Support 3.10.0 documentation.

Even though Pulp can mirror whole PyPi index, it is rather recommended to set it up as pull-through cache. The sync task can stall because the DB might die while creating 8million records for each package on PyPi.

Re:DockerHub, unless you know upfront what repositories you want to mirror, wait for the https://github.com/pulp/pulp_container/issues/507. To clarify, it is not possible to mirror whole Dockerhub because Dockerhub itself does not provide a way to discover its content catalog.

Hi @ipanova I could see that Harbor & Zot has a facility to cache whole Registry. Harbor docs | Configure Proxy Cache

https://zotregistry.io/v1.4.3/articles/mirroring/

@Chandan_Mishra thanks for sharing the links. The mentioned feature is going to be available in Pulp once this PR is merged and released.
As per caching whole Regirsty, I don’t think this is possible because of absence of the catalog listing. It is even mentioned in zot notes too. For this reason you need to populate the content field (from their docs) in the poll mode or specify repository in on-demand(pull-though) mode.

I don’t think this PR is going to have any concept of polling mode, just pull through. If you think this would be beneficial to have please open an RFE.
As a side note polling mode resembles a bit scheduled sync concept in pulp https://github.com/pulp/pulp_container/issues/732

We’re happy to provide more clarification or guidance if needed!

1 Like

Thank you @ipanova. Other question I have about Vulnerability Scan, does Pulp support it? Is there a mechanism to Scan images (Images pulled from Public Registry or Private images) before making available for the users?

hey @Chandan_Mishra, as of today we do not have any integration with scanner. This would be work that would need to be prioritized.

1 Like