Unable to pull from PyPI pull-through cache

Problem:
Unable to pull from PyPI pull-through cache

Expected outcome:
Package pulled from pypi.org, cached in Pulp and installed in client.

Pulpcore version:
3.78.0

Pulp plugins installed and their versions:
core 3.78.0
ansible 0.25.1
container 2.25.1
deb 3.5.2
gem 0.7.1
maven 0.10.1
npm 0.3.3
ostree 2.4.8
python 3.15.0
rpm 3.30.0
certguard 3.78.0
file 3.78.0

Other relevant data:
Hi,

I’ve been trying to setup and pull from a Python pull-through cache in Pulp, so far unsuccessfully. Initially I tried this in a multi-pod k8s environment, but to try and rule out any configuration related issues I have begun testing in the docker / single-container environment. I believe I have followed the example documentation but still fail trying to perform an actual pip install.

The command I am trying to run is:

$ pip install -vvv --trusted-host localhost -i "http://admin:<admin_passord>@localhost:8080/pypi/foo/simple/" glances

In the output I see many lines related to available versions, so the retrieval of the package versions from PyPI.org via my local Pulp seems to be working.

...
Looking in indexes: http://admin:****@localhost:8080/pypi/foo/simple/
1 location(s) to search for versions of glances:
* http://admin:****@localhost:8080/pypi/foo/simple/glances/
Fetching project page and analyzing links: http://admin:****@localhost:8080/pypi/foo/simple/glances/
Getting page http://admin:****@localhost:8080/pypi/foo/simple/glances/
Found credentials in url for localhost:8080
Looking up "http://localhost:8080/pypi/foo/simple/glances/" in the cache
Request header has "max_age" as 0, cache bypassed
No cache entry available
Starting new HTTP connection (1): localhost:8080
http://localhost:8080 "GET /pypi/foo/simple/glances/ HTTP/1.1" 200 71429
Updating cache with response from "http://localhost:8080/pypi/foo/simple/glances/"
Fetched page http://admin:****@localhost:8080/pypi/foo/simple/glances/ as text/html; charset=utf-8
  Found link https://aff214bf3e38/pulp/content/foo/Glances-1.7.tar.gz?redirect=https://files.pythonhosted.org/packages/38/18/3cedaf71e1ae5ce38b098b5a4f477082b1eb4e25da29139b504763461c5a/Glances-1.7.tar.gz#sha256=86b723e79b30d08111186e2c859d710218d98bff483f9606e6fe6354578f6011 (from http://localhost:8080/pypi/foo/simple/glances/), version: 1.7
  Found link https://aff214bf3e38/pulp/content/foo/Glances-1.7.1.tar.gz?redirect=https://files.pythonhosted.org/packages/d7/31/71fc63ba2358b018dc037e15bd7d19f952a3120ff284c35e70cfb45b8546/Glances-1.7.1.tar.gz#sha256=3eec5495e1eb57c1310e9f095fe584537baa4e66c2d3d320985d8563cd40c3eb (from http://localhost:8080/pypi/foo/simple/glances/), version: 1.7.1
  Found link https://aff214bf3e38/pulp/content/foo/Glances-1.7.2.tar.gz?redirect=https://files.pythonhosted.org/packages/52/a0/f837a1d2bd76553575ed96a79649db6fcb5173eefffd63e818626a11b697/Glances-1.7.2.tar.gz#sha256=8901f4b59422f4a0805f2e556550efbf1dcb0be3e9ceece8c6860377b8df44e5 (from http://localhost:8080/pypi/foo/simple/glances/), version: 1.7.2
...

However later in the logs, after all the available versions have been listed, the following error occurs trying to download the package:

Given no hashes to check 153 links for project 'glances': discarding no candidates
Collecting glances
  Created temporary directory: /private/var/folders/kr/nq5k8lws2h54c7dy64s1v4wr0000gp/T/pip-unpack-ne9yzj67
  Looking up "https://aff214bf3e38/pulp/content/foo/glances-4.3.1-py3-none-any.whl?redirect=https://files.pythonhosted.org/packages/15/c9/d74441b085837e98309eeb1e93bd99b9111d1ac959fc0be20e8
33c9e3265/glances-4.3.1-py3-none-any.whl" in the cache
  No cache entry available
  No cache entry available
  Starting new HTTPS connection (1): aff214bf3e38:443
  Incremented Retry for (url='/pulp/content/foo/glances-4.3.1-py3-none-any.whl?redirect=https://files.pythonhosted.org/packages/15/c9/d74441b085837e98309eeb1e93bd99b9111d1ac959fc0be20e833c9
e3265/glances-4.3.1-py3-none-any.whl'): Retry(total=4, connect=None, read=None, redirect=None, status=None)
  WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection obj
ect at 0x10890fb60>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known')': /pulp/content/foo/glances-4.3.1-py3-none-any.whl?redirect=https://files
.pythonhosted.org/packages/15/c9/d74441b085837e98309eeb1e93bd99b9111d1ac959fc0be20e833c9e3265/glances-4.3.1-py3-none-any.whl

I am confused as to where the the (invalid) hostname ‘aff214bf3e38’ has come from and why pip is attempting an HTTPS connection. Do you know why this might be?

If I change the host to my pulp server, I am able to download the package with wget:

$ wget "http://localhost:8080/pulp/content/foo/glances-4.3.1-py3-none-any.whl?redirect=https://files.pythonhosted.org/packages/15/c9/d74441b085837e98309eeb1e93bd99b9111d1ac959fc0be20e833c9e3265/glances-4.3.1-py3-none-any.whl"
--2025-06-16 16:57:56--  http://localhost:8080/pulp/content/foo/glances-4.3.1-py3-none-any.whl?redirect=https://files.pythonhosted.org/packages/15/c9/d74441b085837e98309eeb1e93bd99b9111d1ac959fc0be20e833c9e3265/glances-4.3.1-py3-none-any.whl
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:8080... connected.
HTTP request sent, awaiting response... 200 OK
Length: 908638 (887K) [application/octet-stream]
Saving to: ‘glances-4.3.1-py3-none-any.whl?redirect=https:%2F%2Ffiles.pythonhosted.org%2Fpackages%2F15%2Fc9%2Fd74441b085837e98309eeb1e93bd99b9111d1ac959fc0be20e833c9e3265%2Fglances-4.3.1-py3-none-any.whl.1’

glances-4.3.1-py3-none-any.whl?redirect=https:% 100%[====================================================================================================>] 887.34K  --.-KB/s    in 0.01s

2025-06-16 16:57:56 (77.6 MB/s) - ‘glances-4.3.1-py3-none-any.whl?redirect=https:%2F%2Ffiles.pythonhosted.org%2Fpackages%2F15%2Fc9%2Fd74441b085837e98309eeb1e93bd99b9111d1ac959fc0be20e833c9e3265%2Fglances-4.3.1-py3-none-any.whl.1’ saved [908638/908638]

Pulp configuration
Remote:

{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "pulp_href": "/pulp/api/v3/remotes/python/python/01976a15-9758-7cc7-aa7d-b44e3121edf9/",
      "prn": "prn:python.pythonremote:01976a15-9758-7cc7-aa7d-b44e3121edf9",
      "pulp_created": "2025-06-13T16:18:09.369781Z",
      "pulp_last_updated": "2025-06-13T16:18:09.369805Z",
      "name": "PyPI-mirror",
      "url": "https://pypi.org/",
      "ca_cert": null,
      "client_cert": null,
      "tls_validation": true,
      "proxy_url": null,
      "pulp_labels": {},
      "download_concurrency": null,
      "max_retries": null,
      "policy": "on_demand",
      "total_timeout": null,
      "connect_timeout": null,
      "sock_connect_timeout": null,
      "sock_read_timeout": null,
      "headers": null,
      "rate_limit": null,
      "hidden_fields": [
        {
          "name": "client_key",
          "is_set": false
        },
        {
          "name": "proxy_username",
          "is_set": false
        },
        {
          "name": "proxy_password",
          "is_set": false
        },
        {
          "name": "username",
          "is_set": false
        },
        {
          "name": "password",
          "is_set": false
        }
      ],
      "includes": [],
      "excludes": [],
      "prereleases": true,
      "package_types": [],
      "keep_latest_packages": 0,
      "exclude_platforms": []
    }
  ]
}

Distribution:

{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "pulp_href": "/pulp/api/v3/distributions/python/pypi/01976a19-2db9-7fa3-9e64-5373a9cb7df0/",
      "prn": "prn:python.pythondistribution:01976a19-2db9-7fa3-9e64-5373a9cb7df0",
      "pulp_created": "2025-06-13T16:22:04.474347Z",
      "pulp_last_updated": "2025-06-13T16:23:24.017925Z",
      "base_path": "foo",
      "base_url": "https://aff214bf3e38/pypi/foo/",
      "content_guard": null,
      "no_content_change_since": null,
      "hidden": false,
      "pulp_labels": {},
      "name": "foo",
      "repository": "/pulp/api/v3/repositories/python/python/01976a15-19d3-7fb2-9d22-b48e6d61f96f/",
      "publication": null,
      "allow_uploads": true,
      "remote": "/pulp/api/v3/remotes/python/python/01976a15-9758-7cc7-aa7d-b44e3121edf9/"
    }
  ]
}

Publication:

{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "pulp_href": "/pulp/api/v3/publications/python/pypi/01976a17-d1ec-7f3a-95af-58025af79e84/",
      "prn": "prn:python.pythonpublication:01976a17-d1ec-7f3a-95af-58025af79e84",
      "pulp_created": "2025-06-13T16:20:35.439758Z",
      "pulp_last_updated": "2025-06-13T16:20:35.647412Z",
      "repository_version": "/pulp/api/v3/repositories/python/python/01976a15-19d3-7fb2-9d22-b48e6d61f96f/versions/0/",
      "repository": "/pulp/api/v3/repositories/python/python/01976a15-19d3-7fb2-9d22-b48e6d61f96f/",
      "distributions": []
    }
  ]
}

Repository:

{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "pulp_href": "/pulp/api/v3/repositories/python/python/01976a15-19d3-7fb2-9d22-b48e6d61f96f/",
      "prn": "prn:python.pythonrepository:01976a15-19d3-7fb2-9d22-b48e6d61f96f",
      "pulp_created": "2025-06-13T16:17:37.238192Z",
      "pulp_last_updated": "2025-06-13T16:22:15.377597Z",
      "versions_href": "/pulp/api/v3/repositories/python/python/01976a15-19d3-7fb2-9d22-b48e6d61f96f/versions/",
      "pulp_labels": {},
      "latest_version_href": "/pulp/api/v3/repositories/python/python/01976a15-19d3-7fb2-9d22-b48e6d61f96f/versions/0/",
      "name": "foo",
      "description": null,
      "retain_repo_versions": null,
      "remote": null,
      "autopublish": true
    }
  ]
}

@robert-smith-maersk Sorry this is a documentation problem. There is a setting called PYPI_API_HOSTNAME that needs to be set in order for Pulp to generate the correct urls for pip. [0] This value is the hostname of the API that you will pass onto your tools & clients. Typically when running the docker Pulp images it will be the same value as CONTENT_ORIGIN, so http://localhost:8080. The reason why you see aff214bf3e38 is that we use a default value if not set, but this comes from socket.getfqdn() and running this inside a container gives the container’s internal hostname which is not useful for outside clients.

Hopefully this is the only major plugin setting we fail to document in the OCI-images quickstart quide. I’ll file an issue to fix it.

[0] Settings - Pulp Project

3 Likes

@gerrod thanks, that has indeed fixed it for me when set in settings.py in the single container setup. Could you let me know how that should be configured when using the operator deployment? I’ve updated the settings configmap as follows:

apiVersion: v1
kind: ConfigMap
metadata:
  name: settings
  namespace: pulp
  uid: 17c00098-feb5-4eec-9b15-c87a129d18cb
  resourceVersion: '15649513'
  creationTimestamp: '2025-06-03T10:50:03Z'
  labels:
    k8slens-edit-resource-version: v1
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"v1","data":{"allowed_export_paths":"[ \"/tmp\"
      ]","allowed_import_paths":"[ \"/tmp\"
      ]","analytics":"False","api_root":"\"/pulp/\"","pypi_api_hostname":"\"http://<pulp_api_ip_address>\""},"kind":"ConfigMap","metadata":{"annotations":{},"name":"settings","namespace":"pulp"}}
  managedFields:
    - manager: kubectl-client-side-apply
      operation: Update
      apiVersion: v1
      time: '2025-06-17T16:06:07Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:data:
          .: {}
          f:allowed_export_paths: {}
          f:allowed_import_paths: {}
          f:analytics: {}
          f:api_root: {}
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
    - manager: node-fetch
      operation: Update
      apiVersion: v1
      time: '2025-06-17T16:23:14Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:data:
          f:pypi_api_hostname: {}
        f:metadata:
          f:labels:
            .: {}
            f:k8slens-edit-resource-version: {}
  selfLink: /api/v1/namespaces/pulp/configmaps/settings
data:
  allowed_export_paths: '[ "/tmp" ]'
  allowed_import_paths: '[ "/tmp" ]'
  analytics: 'False'
  api_root: '"/pulp/"'
  pypi_api_hostname: '"http://<pulp_api_ip_address>"'

However that doesn’t appear to be effective. After updating the configmap and waiting for various pods to restart, I retried the pip install and it appears to receive an in-cluster address rather than the specified public IP:

...
Starting new HTTP connection (1): example-pulp-web-svc.pulp.svc.cluster.local:24880
...

Hi @robert-smith-maersk

Could you let me know how that should be configured when using the operator deployment?

The ConfigMap provided seems to be correct. Can you please also share your Pulp CR? I think this is happening because of an ingress configuration (or lack of an ingress definition and the operator using a “default” value as CONTENT_ORIGIN).

Hi @hyagi,

I’ve added CONTENT_ORIGIN with the same value as PYPI_API_HOSTNAME as shown in the config below and that seems have fixed it, thank you.

I am using a simple load balancer. As suggested in the docs that seemed to be the simplest way to get everything up and running in Azure.

---
apiVersion: v1
kind: Secret
metadata:
  name: 'example-pulp-admin-password'
stringData:
  password: '<admin_password>'

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: settings
data:
  analytics: "False"
  content_origin: '"http://<public_ip>"'
  api_root: '"/pulp/"'
  pypi_api_hostname: '"http://<public_ip>"'
  allowed_export_paths: '[ "/tmp" ]'
  allowed_import_paths: '[ "/tmp" ]'

---
apiVersion: repo-manager.pulpproject.org/v1
kind: Pulp
metadata:
  name: example-pulp
spec:
  telemetry:
    enabled: true
  custom_pulp_settings: settings
  admin_password_secret: "example-pulp-admin-password"

  api:
    replicas: 1
  content:
    replicas: 1
  worker:
    replicas: 1
  web:
    replicas: 1

  database:
    postgres_storage_class: default

  file_storage_access_mode: "ReadWriteMany"
  file_storage_size: "2Gi"
  file_storage_storage_class: azurefile

  # Redis configs
  cache:
    enabled: true
    redis_storage_class: default

  ingress_type: loadbalancer