Sizing WORKING_DIRECTORY (/var/lib/pulp/tmp) when artifacts are stored in S3 (post-#1936 fix)

We previously ran into this issue during remote syncs (RPM plugin) where temporary downloaded files would accumulate and not be cleared until the end of the sync job when artifact storage was not on the same filesystem (we use S3 for artifacts):

That issue is now closed and we’ve upgraded to a version where it should be fixed, but we’re still trying to size local disk/ephemeral storage appropriately for staging downloads in WORKING_DIRECTORY (/var/lib/pulp/tmp).

Even with artifacts stored in S3, syncs still stage downloads locally, and we sometimes sync multiple large repos in parallel. We want to avoid worker pods/nodes running out of local/ephemeral storage.

Questions:

  1. For a setup with artifact storage in S3 (STORAGES default backend = S3), what’s the recommended way to size local WORKING_DIRECTORY storage? Any rule of thumb (per worker, per concurrent sync, etc.)?

  2. In current pulpcore, is a temp file deleted immediately after it’s successfully moved/uploaded to the artifact storage backend (S3), or can it persist until the end of a batch or task?

  3. Are there tunables that materially reduce peak temp disk usage?

    • number of workers / concurrent sync tasks
    • Remote download_concurrency
    • MAX_CONCURRENT_CONTENT
    • anything else that affects how many artifacts can be staged locally at once
  4. The pulpcore hardware requirements page mentions that “30GB should be enough in the majority of cases” when using S3 for artifact storage — does that assume a certain concurrency (workers / parallel syncs) or repo profile?
    Hardware requirements - Pulp Project

Any guidance or real-world sizing examples would be appreciated.

Thanks!