Backend S3 Bucket sharing

Problem:

Expected outcome:

Pulpcore version:
“core”: “3.48.0”,

Pulp plugins installed and their versions:
“versions”: {
“rpm”: “3.25.1”,
“core”: “3.48.0”,
“file”: “3.48.0”,
“certguard”: “3.48.0”
},
Operating system - distribution and version:
RHEL 9
Other relevant data:
I am implementing new servers to mirror RedHat and CentOS repos, as of now all setup is working properly in my testing instance deployed in AWS and using S3 for artifact storage and rpm plugin. Setting REDIRECT_TO_OBJECT_STORAGE is true by default so all clients receive HTTP 302 and are redirected to presigned S3 URLs to download content. In the past implementation we had 2 pulp servers version 2 for region us-east-1, the question is: Can i have two pulp EC2 instances pointing to the same S3 bucket in settings.py? Or should I need one bucket per pulp instance, I tried to find this detail in docs but was not able to do it. Thanks a lot in advance.

1 Like

The way you asked, no. One Pulp == one database == one storage == one configuration == zero-one Redis

But, maybe this is a matter of nomenclature. By one Pulp I really mean “one installation of pulp”, and I think what you are looking at is some sort of high availability. The one installation of Pulp must have a single shared database and a single storage configuration (because the former needs to know what lives in the latter). But all the Pulp services (api, content server and worker) are rather stateless and can be scaled independently. That is you can e.g. run additional content servers in your regions if they are configured the same (running the same python code, pointing to the same database, storage and redis cluster).

Somehow contravariant to your question, you can use multiple storages in one Pulp installations when using Domains, where each domain almost represents an individual pulp service.

We’d love to hear more about your setup if you are confident to share.

2 Likes

Hi @x9c4 , thanks a lot for your quick reply, I really value this information since this is my first contact with pulp. Well in my use case we understand that we can scale and distribute stateless components as you mention, however for now that is not a requirement. Since we were operating with pulp v2 using local fileystem for artifacts now with pulp v3 and S3 we see a more convenient architecture and a huge performance increase since clients now go directly to S3 bucket, kind of offloading transfer load to AWS. For now we don’t think in complex HA architecture to be deployed, we were only concerned about the cost in S3, so if I understood correctly and as I was thinking in the first place, posgresql DB knows how artifacts are indexed in S3 artifacts folder (hash names for folders and files), so this requires that each EC2 instance of pulp that we deploy MUST have its own S3 bucket, this, because currently we are not planning to use the domains feature given the simple requirement that we have. So if my understanding of your feedback is correct, our plan will be to have in production in regions USE1 and USW2, an ALB for SSL termination in each region in front of two EC2 pulp instances (running all services) each instance using its own S3 bucket. Serving rpms for CentOS and RHEL.

2 Likes

Well that would be one way, but as mentioned, when you can make both “installations” share the database and redis cluster, they would basically be one installation, while still having frontends in multiple regions.
I’d probably start the workers only in the region closer to the database and storage, but deploy some api and content apps closer to the consumers (Developer speaking here. I’m not too familiar with actual cloud deployments.).

@x9c4 , thanks a lot for your input, with this we can continue with deployment and testing. :+1: