Error with pulp operator in openshift

midhuhk · July 25, 2024, 9:04am

I am trying to install pulp in openshift 4.13 cluster .Installed pulp operator after mirroring the gcr repos from artifactory . Operators are installed , however when I try install pulp instance , operator pod goes to crashback with error . Redis and postgres pods are running .

error :

2024-03-19T13:24:41Z INFO controller/controller.go:115 Observed a panic in reconciler: cannot parse ‘’: quantities must match the regular expression ‘^([±]?[0-9.]+)([eEinumkKMGTP][-+]?[0-9])$’ {“controller”: “pulp”, “controllerGroup”: “repo-manager.pulpproject.org”, “controllerKind”: “Pulp”, “Pulp”: {“name”:“example”,“namespace”:“pulp-test”}, “namespace”: “pulp-test”, “name”: “example”, “reconcileID”: “1a681941-3ed0-4d64-a235-e27cc07e1e05”}
panic: cannot parse ‘’: quantities must match the regular expression ‘^([±]?[0-9.]+)([eEinumkKMGTP][-+]?[0-9])$’ [recovered]
panic: cannot parse ‘’: quantities must match the regular expression ‘^([±]?[0-9.]+)([eEinumkKMGTP][-+]?[0-9])$’

Not sure if I am missing something here.

NAME READY STATUS RESTARTS AGE
example-database-0 1/1 Running 0 140m
example-redis-56974c5944-xz95m 1/1 Running 0 140m
pulp-operator-controller-manager-56d74856b6-5v267 1/2 CrashLoopBackOff 66 (35s ago) 26h

hyagi · March 20, 2024, 10:42am

Hi @midhuhk

Can you provide us the output of:

oc -npulp-test get pulp example -oyaml

I think there is a resource_requirement (maybe a memory definition Resource Management for Pods and Containers | Kubernetes) in Pulp CR that is not following the k8s expected format.

midhuhk · March 21, 2024, 6:26am

I was getting error in beta version . It works fine with alpha version .

midhuhk · April 11, 2024, 1:50pm

As suggested by community I will be moving to beta and I still have above issue .

@hyagi please find the output below,

[be3075@yb1404 ~]$ oc get pulp pulp-server-az3 -o yaml
apiVersion: repo-manager.pulpproject.org/v1beta2
kind: Pulp
metadata:
creationTimestamp: “2024-04-11T12:12:35Z”
generation: 1
name: pulp-server-az3
namespace: pulp-test-2vlx-test
resourceVersion: “2567329917”
uid: 3d0d5d8e-244f-42c8-a26d-5200f10f61a2
spec:
api:
gunicorn_timeout: 90
gunicorn_workers: 2
replicas: 1
cache:
enabled: true
container_auth_private_key_name: container_auth_private_key.pem
container_auth_public_key_name: container_auth_public_key.pem
content:
gunicorn_timeout: 90
gunicorn_workers: 2
replicas: 2
resource_requirements:
limits:
cpu: 800m
memory: 1Gi
requests:
cpu: 150m
memory: 256Mi
deployment_type: pulp
file_storage_storage_class: ocs-storagecluster-cephfs
image: quay.io/pulp/pulp-minimal
image_pull_policy: IfNotPresent
image_version: stable
image_web: quay.io/pulp/pulp-web
image_web_version: stable
ingress_type: Route
mount_trusted_ca: false
pulp_settings:
allowed_export_paths:
- /tmp
allowed_import_paths:
- /tmp
api_root: /pulp/
route_host: pulp-server.apps.az3-ost00.danskenet.net/
telemetry:
enabled: false
exporter_otlp_protocol: http/protobuf
web:
replicas: 1
worker:
replicas: 2
resource_requirements:
limits:
cpu: 800m
memory: 1Gi
requests:
cpu: 150m
memory: 256Mi
status:
conditions:

lastTransitionTime: “2024-04-11T12:12:35Z”
message: pulp-server-az3 operator tasks running
reason: OperatorRunning
status: “False”
type: Pulp-Operator-Finished-Execution
lastTransitionTime: “2024-04-11T12:12:35Z”
message: Creating pulp-server-az3 SA resource
reason: CreatingSA
status: “False”
type: Pulp-API-Ready
lastTransitionTime: “2024-04-11T12:12:35Z”
message: All Database tasks ran successfully
reason: DatabaseTasksFinished
status: “True”
type: Pulp-Database-Ready
managed_cache_enabled: true

hyagi · April 11, 2024, 3:46pm

Hi @midhuhk,

Thank you for providing the CR! I could reproduce your error in a lab environment and now I have a better understanding of this issue.

During my tests, checking the stack trace, I could see that the
panic: cannot parse ‘’: quantities must match the regular expression ‘^([±]?[0-9.]+)([eEinumkKMGTP][-+]?[0-9])$’ [recovered]

happened because of a missing file_storage_size definition:

panic: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'

goroutine 406 [running]:
...
        /home/hyagi/pulp/pulp-operator/controllers/repo_manager/utils.go:369 +0x429
github.com/pulp/pulp-operator/controllers/repo_manager.(*RepoManagerReconciler).pulpFileStorage(0x1ce97c8?, {0x1ce6890, 0xc00063dd40}, 0xc000864580)
        /home/hyagi/pulp/pulp-operator/controllers/repo_manager/pvc.go:41 +0x28f
....

The operator has a check in case a file_storage_size and/or file_storage_access_mode have been provided, but we are missing a check if file_storage_storage_class exists but the others not. I’ll open an issue in pulp-operator repo to avoid it panic’ing in such a situation and providing a better error message.

So, can you please try to add the following definitions in your CR and see if the operator runs fine:

spec:
  file_storage_size: 100Gi (or any other size, this is just an example)
  file_storage_access_mode: ReadWriteMany

hyagi · April 11, 2024, 4:03pm

github.com/pulp/pulp-operator

The operator is breaking if only the `file_storage_storage_class` is provided

opened 04:01PM - 11 Apr 24 UTC

git-hyagi

bug

**Describe the bug** Configuring Pulp CR with `file_storage_storage_class` but …without the `file_storage_size` definition will break the operator: ``` 2024-04-11T15:31:37Z INFO controller/controller.go:115 Observed a panic in reconciler: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$' {"controller": "pulp", "controllerGroup": "repo-manager.pulpproj ect.org", "controllerKind": "Pulp", "Pulp": {"name":"pulp-server-az3","namespace":"pulp"}, "namespace": "pulp", "name": "pulp-server-az3", "reconcileID": "35811dcd-86e6-4816-bfe0-6400060e7979"} panic: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$' [recovered] panic: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$' goroutine 406 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /home/hyagi/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:116 +0x1e5 panic({0x17cbb20?, 0xc00031be90?}) /usr/local/go/src/runtime/panic.go:914 +0x21f k8s.io/apimachinery/pkg/api/resource.MustParse({0x0, 0x0}) /home/hyagi/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/api/resource/quantity.go:139 +0x173 github.com/pulp/pulp-operator/controllers/repo_manager.fileStoragePVC({{0x1ce6890, 0xc00063dd40}, {0x1cee8c0, 0xc000337c20}, 0xc000864580, 0xc0002601c0, {{0x1ce97c8, 0xc0000bf880}, 0x0}}) /home/hyagi/pulp/pulp-operator/controllers/repo_manager/pvc.go:66 +0x27e github.com/pulp/pulp-operator/controllers/repo_manager.(*RepoManagerReconciler).createPulpResource(0xc000b02360, {{0x1ce6890, 0xc00063dd40}, {0x19e5180, 0xc00051e3c0}, {0xc000d002e0, 0x1c}, {0x1a19f14, 0xb}, {0xc000324c70, ...}, ...}, ...) /home/hyagi/pulp/pulp-operator/controllers/repo_manager/utils.go:369 +0x429 github.com/pulp/pulp-operator/controllers/repo_manager.(*RepoManagerReconciler).pulpFileStorage(0x1ce97c8?, {0x1ce6890, 0xc00063dd40}, 0xc000864580) /home/hyagi/pulp/pulp-operator/controllers/repo_manager/pvc.go:41 +0x28f ... exit status 2 ``` which is hard to identify because by checking only the panic message we don't know if the error was because of a wrong `resource_requirements` definition or not. **To Reproduce** Steps to reproduce the behavior: In a minikube env, configure the CR with: ``` spec: file_storage_storage_class: standard api: replicas: 1 content: replicas:1 worker: replicas: 1 ``` **Expected behavior** The operator should not panic, but provide an error message about the missing `file_storage_size` and `file_storage_access_mode`. **Additional context** Thanks to @midhuhk for reporting this issue in discourse!

midhuhk · April 15, 2024, 12:26pm

Hi ,

After adding the mentioned specs it proceeded with installation. However I see issue memory with API pods in logs.

[be3075@yb1404 ~]$ oc logs pod/pulp-api-84c9b578c4-5rtqw
Waiting on postgresql to start…
Postgres started.
Checking for database migrations
error: Failed to initialize NSS library
Database migrated!
/usr/local/bin/pulpcore-api
[2024-04-15 11:35:20 +0000] [1] [INFO] Starting gunicorn 21.2.0
[2024-04-15 11:35:20 +0000] [1] [INFO] Listening at: http://[::]:24817 (1)
[2024-04-15 11:35:20 +0000] [1] [INFO] Using worker: pulpcore.app.entrypoint.PulpApiWorker
[2024-04-15 11:35:20 +0000] [32] [INFO] Booting worker with pid: 32
[2024-04-15 11:35:21 +0000] [33] [INFO] Booting worker with pid: 33
error: Failed to initialize NSS library
error: Failed to initialize NSS library
[2024-04-15 11:35:50 +0000] [1] [ERROR] Worker (pid:32) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:35:50 +0000] [52] [INFO] Booting worker with pid: 52
(‘pulp [ddd271d19bc5435e90ee6b9434266d44]: ::ffff:10.141.12.1 - - [15/Apr/2024:11:36:07 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4650 “-” “kube-probe/1.26”’,)
(‘pulp [dfc2e50451294fdead0546171c149305]: ::ffff:10.141.12.1 - - [15/Apr/2024:11:36:07 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4376 “-” “kube-probe/1.26”’,)
(‘pulp [c06ec3201ec6419892849529a1a7202f]: ::ffff:10.141.12.1 - - [15/Apr/2024:11:36:08 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4376 “-” “kube-probe/1.26”’,)
[2024-04-15 11:36:10 +0000] [1] [ERROR] Worker (pid:33) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:36:10 +0000] [67] [INFO] Booting worker with pid: 67
error: Failed to initialize NSS library
[2024-04-15 11:36:32 +0000] [1] [ERROR] Worker (pid:52) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:36:32 +0000] [82] [INFO] Booting worker with pid: 82
error: Failed to initialize NSS library
[2024-04-15 11:36:54 +0000] [1] [ERROR] Worker (pid:67) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:36:54 +0000] [103] [INFO] Booting worker with pid: 103
error: Failed to initialize NSS library
[2024-04-15 11:37:15 +0000] [1] [ERROR] Worker (pid:82) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:37:15 +0000] [118] [INFO] Booting worker with pid: 118
error: Failed to initialize NSS library
[2024-04-15 11:37:36 +0000] [1] [ERROR] Worker (pid:103) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:37:36 +0000] [133] [INFO] Booting worker with pid: 133
error: Failed to initialize NSS library
[2024-04-15 11:37:56 +0000] [1] [ERROR] Worker (pid:118) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:37:56 +0000] [148] [INFO] Booting worker with pid: 148
error: Failed to initialize NSS library
[2024-04-15 11:38:16 +0000] [1] [ERROR] Worker (pid:133) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:38:16 +0000] [163] [INFO] Booting worker with pid: 163
error: Failed to initialize NSS library
[2024-04-15 11:38:36 +0000] [1] [ERROR] Worker (pid:148) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:38:36 +0000] [178] [INFO] Booting worker with pid: 178
error: Failed to initialize NSS library
[2024-04-15 11:38:55 +0000] [1] [ERROR] Worker (pid:163) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:38:56 +0000] [193] [INFO] Booting worker with pid: 193
error: Failed to initialize NSS library
[2024-04-15 11:39:18 +0000] [1] [ERROR] Worker (pid:178) was sent SIGKILL! Perhaps out of memory?
[2024-04-15 11:39:18 +0000] [208] [INFO] Booting worker with pid: 208
error: Failed to initialize NSS library

I had same issue with alpha version and maaged to fix that by unmanaging the operator and increasing the momory.

Any sugestions ?

hyagi · April 16, 2024, 1:33am

Glad to know the installation proceeded!

“I had same issue with alpha version and maaged to fix that by unmanaging the operator and increasing the momory.”

Hum… instead of putting the operator in unmanaged state, let’s try to define the resources of API pods through the CR, for example:

spec:
  api:
    resource_requirements:
      limits:
        memory: 2Gi
      requests:
        memory: 2Gi
...

note: it is strange that the API pods are getting out of memory considering that, from the CR provided, there are no resource limits defined for them. Maybe your namespace has a k8s limitrange with default limits?!

midhuhk · April 16, 2024, 12:11pm

Hello ,

In operator manged mode ,it does not allow to edit the deployment. However by unmanaging and adding memory configuration in deployment I see data in pulp route .

Another issue which I observed is Internal error while sync is in progress for some time.

[root@yd3248.danskenet.net TEST:~]# pulp rpm repository sync --name test-appstream-repo --remote test_appstream
Started background task /pulp/api/v3/tasks/018ee6b2-ff1c-768d-a0d0-2bd26462dbf0/
…Traceback (most recent call last):
File “/usr/local/bin/pulp”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 1157, in call
return self.main(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 1078, in main
rv = self.invoke(ctx)
File “/usr/local/lib/python3.9/site-packages/pulpcore/cli/common/generic.py”, line 289, in invoke
return super().invoke(ctx)
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/usr/local/lib/python3.9/site-packages/pulpcore/cli/common/generic.py”, line 289, in invoke
return super().invoke(ctx)
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/usr/local/lib/python3.9/site-packages/pulpcore/cli/common/generic.py”, line 289, in invoke
return super().invoke(ctx)
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/usr/local/lib/python3.9/site-packages/pulpcore/cli/common/generic.py”, line 289, in invoke
return super().invoke(ctx)
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 783, in invoke
return __callback(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/click/decorators.py”, line 92, in new_func
return ctx.invoke(f, obj, *args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 783, in invoke
return __callback(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/pulpcore/cli/rpm/repository.py”, line 308, in sync
repository_ctx.sync(body=body)
File “/usr/local/lib/python3.9/site-packages/pulp_glue/rpm/context.py”, line 360, in sync
return super().sync(href, body)
File “/usr/local/lib/python3.9/site-packages/pulp_glue/common/context.py”, line 1266, in sync
return self.call(“sync”, parameters={self.HREF: href or self.pulp_href}, body=body or {})
File “/usr/local/lib/python3.9/site-packages/pulp_glue/common/context.py”, line 722, in call
return self.pulp_ctx.call(
File “/usr/local/lib/python3.9/site-packages/pulp_glue/common/context.py”, line 396, in call
result = self.wait_for_task(result)
File “/usr/local/lib/python3.9/site-packages/pulp_glue/common/context.py”, line 465, in wait_for_task
task = self.api.call(“tasks_read”, parameters={“task_href”: task[“pulp_href”]})
File “/usr/local/lib/python3.9/site-packages/pulp_glue/common/openapi.py”, line 724, in call
response.raise_for_status()
File “/usr/lib/python3.9/site-packages/requests/models.py”, line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://pulp.apps.az3-ost00.danskenet.net/pulp/api/v3/tasks/018ee6b2-ff1c-768d-a0d0-2bd26462dbf0/

worker pod logs:
[be3075@yb1404 ~]$ oc logs pod/pulp-worker-54f4d79668-9l5kq
Waiting on postgresql to start…
Postgres started.
Checking for database migrations
error: Failed to initialize NSS library
Database migrated!
error: Failed to initialize NSS library
pulp [None]: pulpcore.tasking.entrypoint:INFO: Starting distributed type worker
pulp [None]: pulpcore.tasking.worker:INFO: Worker ‘1@pulp-worker-54f4d79668-9l5kq’ is back online.
pulp [None]: pulpcore.tasking.worker:INFO: Cleaning up task 018ee6b2-ff1c-768d-a0d0-2bd26462dbf0 and marking as failed. Reason: Worker has gone missing.

[be3075@yb1404 ~]$ oc logs pod/pulp-worker-54f4d79668-bhbj9
Waiting on postgresql to start…
Postgres started.
Checking for database migrations
error: Failed to initialize NSS library
Database migrated!
error: Failed to initialize NSS library
pulp [None]: pulpcore.tasking.entrypoint:INFO: Starting distributed type worker
pulp [None]: pulpcore.tasking.worker:INFO: Worker ‘1@pulp-worker-54f4d79668-bhbj9’ is back online

hyagi · April 16, 2024, 5:11pm

In operator manged mode ,it does not allow to edit the deployment.

Yes, this is the expected behavior. You should not try to edit the Deployment directly because the operator will reconcile your changes. More info: FAQ - Pulp Operator
You should instead update Pulp CR, for example:

$ oc edit pulp pulp-server-az3
apiVersion: repo-manager.pulpproject.org/v1beta2
kind: Pulp
metadata:
  name: pulp-server-az3
...
spec:
  api:
    resource_requirements:   <------ add these lines
      limits:                <------ add these lines
        memory: 2Gi          <------ add these lines
      requests:              <------ add these lines
        memory: 2Gi          <------ add these lines
...

Another issue which I observed is Internal error while sync is in progress for some time.

Hum … for this error, if you could provide us a cluster-info output we could investigate it further (like checking the API logs during the sync, the k8s events that could give us more clue, etc.).

midhuhk · April 17, 2024, 6:56am

Hi ,
resource_requirements tab is already present and is updated to below value .However memory error still pops out in API pod.

Pod logs:

It works by editing deployment (after unmanaging) . I will update cluster-info in some time .

hyagi · April 17, 2024, 11:05am

Hi @midhuhk,

resource_requirements tab is already present and is updated to below value .However memory error still pops out in API pod.

From the screenshot provided, I can see that you defined the limits and requests for pulpcore-content pods (we have 3 main components: pulpcore-api, pulpcore-content, and pulpcore-worker):

spec:
...
api:    <-------- here starts the definitions for pulpcore-api pods (note that there are no resource_requirements set)
  gunicorn_timeout: 90
  gunicorn_workers: 2
  replicas: 2
cache:
  enabled: true
...
content:    <----- here starts the definitions for pulpcore-content pods
  gunicorn_timeout: 90
  gunicorn_workers: 2
  replicas: 2
  resource_requirements:  <----- resource_requirements for pulpcore-content pods
    limits:
      cpu: 2
      memory: 2Gi
    requests:
      cpu: 1
      memory: 2Gi
...

Considering that the memory error happens on API pods, the idea was to set the resource_requirements for the pulpcore-api pods, for example:

spec:
...
api:      <-------- here starts the definitions for pulpcore-api pods
  gunicorn_timeout: 90
  gunicorn_workers: 2
  replicas: 2
  resource_requirements:   <------- resource_requirements for pulpcore-api pods
    limits:
      cpu: 2
      memory: 2Gi
    requests:
      cpu: 1
      memory: 2Gi
cache:
  enabled: true
...
content:
  gunicorn_workers: 2
  replicas: 2
  resource_requirements:
    limits:
      cpu: 2
      memory: 2Gi
    requests:
      cpu: 1
      memory: 2Gi
...

midhuhk · April 17, 2024, 1:12pm

Appreciate your support . By adding the spec under API , it looks good.

I have the inspect collected for internal error issue , but dont see an option to attach file. Am I missing something here?

hyagi · April 17, 2024, 6:49pm

By adding the spec under API , it looks good.

Nice! Glad to know it worked!

I have the inspect collected for internal error issue , but dont see an option to attach file. Am I missing something here?

Hum… I’m not sure if it is possible to attach files in here, if you have an easy way to share the files with us (like dropbox or gdrive sharing link) we appreciate.

midhuhk · April 18, 2024, 9:28am

have tried to capture as much as logs possible here . Please let me know if incase more are required.

#################
Pod details :

[be3075@yb1404 ~]$ oc get pods
NAME READY STATUS RESTARTS AGE
pulp-api-6bbc4df7b5-lbqjf 1/1 Running 0 19h
pulp-content-774594bd8d-bhhrh 1/1 Running 0 20h
pulp-content-774594bd8d-fwmzg 1/1 Running 0 20h
pulp-database-0 1/1 Running 0 20h
pulp-operator-controller-manager-598fbc76b7-cvhx6 2/2 Running 15 (4h3m ago) 2d21h
pulp-redis-584d45fffb-ggwqp 1/1 Running 0 20h
pulp-worker-7c9cddbfb-5mbsw 1/1 Running 1 (2m37s ago) 15m
pulp-worker-7c9cddbfb-qnmnv 1/1 Running 1 (2m36s ago) 16m

#####################################################

API pod logs :

(‘pulp [f482273951244785bdb48ab8a3ea048f]: ::ffff:10.140.12.1 - admin [18/Apr/2024:07:04:54 +0000] “GET /pulp/api/v3/tasks/018eeffd-a278-703a-9515-237845ba2819/ HTTP/1.1” 500 145 “-” “Pulp-CLI/0.24.1”’,)
pulp [None]: pulpcore.app.entrypoint:INFO: Api App ‘14@pulp-api-6bbc4df7b5-lbqjf’ failed to write a heartbeat to the database, sleeping for ‘45.0’ seconds.
pulp [None]: pulpcore.app.entrypoint:INFO: Api App ‘14@pulp-api-6bbc4df7b5-lbqjf’ failed to write a heartbeat to the database, sleeping for ‘45.0’ seconds.
pulp [None]: pulpcore.app.entrypoint:INFO: Api App ‘14@pulp-api-6bbc4df7b5-lbqjf’ failed to write a heartbeat to the database, sleeping for ‘45.0’ seconds.
pulp [None]: pulpcore.app.entrypoint:INFO: Api App ‘15@pulp-api-6bbc4df7b5-lbqjf’ failed to write a heartbeat to the database, sleeping for ‘45.0’ seconds.
pulp [None]: pulpcore.app.entrypoint:INFO: Api App ‘14@pulp-api-6bbc4df7b5-lbqjf’ failed to write a heartbeat to the database, sleeping for ‘45.0’ seconds.
pulp [8878b12e0b59403399ec985b937c5601]: django.request:ERROR: Internal Server Error: /pulp/api/v3/status/
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 289, in ensure_connection
self.connect()
File “/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py”, line 26, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 270, in connect
self.connection = self.get_new_connection(conn_params)
File “/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py”, line 26, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/backends/postgresql/base.py”, line 275, in get_new_connection
connection = self.Database.connect(**conn_params)
File “/usr/local/lib/python3.9/site-packages/psycopg/connection.py”, line 748, in connect
raise last_ex.with_traceback(None)
psycopg.OperationalError: connection failed: FATAL: the database system is in recovery mode

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py”, line 55, in inner
response = get_response(request)
File “/usr/local/lib/python3.9/site-packages/django/core/handlers/base.py”, line 185, in _get_response
response = middleware_method(
File “/usr/local/lib/python3.9/site-packages/pulpcore/middleware.py”, line 35, in process_view
domain = Domain.objects.get(name=domain_name)
File “/usr/local/lib/python3.9/site-packages/django/db/models/manager.py”, line 87, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/models/query.py”, line 633, in get
num = len(clone)
File “/usr/local/lib/python3.9/site-packages/django/db/models/query.py”, line 380, in len
self._fetch_all()
File “/usr/local/lib/python3.9/site-packages/django/db/models/query.py”, line 1881, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File “/usr/local/lib/python3.9/site-packages/django/db/models/query.py”, line 91, in iter
results = compiler.execute_sql(
File “/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py”, line 1560, in execute_sql
cursor = self.connection.cursor()
File “/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py”, line 26, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 330, in cursor
return self._cursor()
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 306, in _cursor
self.ensure_connection()
File “/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py”, line 26, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 289, in ensure_connection
self.connect()
File “/usr/local/lib/python3.9/site-packages/django/db/utils.py”, line 91, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 289, in ensure_connection
self.connect()
File “/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py”, line 26, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 270, in connect
self.connection = self.get_new_connection(conn_params)
File “/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py”, line 26, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/backends/postgresql/base.py”, line 275, in get_new_connection
connection = self.Database.connect(**conn_params)
File “/usr/local/lib/python3.9/site-packages/psycopg/connection.py”, line 748, in connect
raise last_ex.with_traceback(None)
django.db.utils.OperationalError: connection failed: FATAL: the database system is in recovery mode
(‘pulp [8878b12e0b59403399ec985b937c5601]: ::ffff:10.141.12.1 - - [18/Apr/2024:07:04:57 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 500 145 “-” “kube-probe/1.26”’,)
pulp [None]: pulpcore.app.entrypoint:INFO: Api App ‘15@pulp-api-6bbc4df7b5-lbqjf’ failed to write a heartbeat to the database, sleeping for ‘45.0’ seconds.
pulp [None]: pulpcore.app.entrypoint:INFO: Api App ‘15@pulp-api-6bbc4df7b5-lbqjf’ failed to write a heartbeat to the database, sleeping for ‘45.0’ seconds.
(‘pulp [20c81711c0174f90b4a1f159b0b28e3e]: ::ffff:10.141.12.1 - - [18/Apr/2024:07:05:17 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4111 “-” “kube-probe/1.26”’,)
(‘pulp [a292b3a8cd574bb1809725c77d344a20]: ::ffff:10.141.12.1 - - [18/Apr/2024:07:05:37 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4111 “-” “kube-probe/1.26”’,)
(‘pulp [d008d208280d490f878e268a005213e3]: ::ffff:10.141.12.1 - - [18/Apr/2024:07:05:57 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4111 “-” “kube-probe/1.26”’,)
(‘pulp [1d648ed8c5154204b49e23eea08581fa]: ::ffff:10.141.12.1 - - [18/Apr/2024:07:06:17 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4111 “-” “kube-probe/1.26”’,)
(‘pulp [89994f437629458fb185dba3b0848188]: ::ffff:10.141.12.1 - - [18/Apr/2024:07:06:37 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4111 “-” “kube-probe/1.26”’,)
(‘pulp [bb05b7aad4fc4c4e8b1a63cbf5b12df6]: ::ffff:10.141.12.1 - - [18/Apr/2024:07:06:57 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4111 “-” “kube-probe/1.26”’,)
(‘pulp [fe2995194f05416287c4697421e6119b]: ::ffff:10.141.12.1 - - [18/Apr/2024:07:07:17 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4111 “-” “kube-probe/1.26”’,)
(‘pulp [ab14a9d1547041a1b9415990268d5b72]: ::ffff:10.141.12.1 - - [18/Apr/2024:07:07:37 +0000] “GET /pulp/api/v3/status/ HTTP/1.1” 200 4111 “-” “kube-probe/1.26”’,)

#################################################################################################33

Worker -1

[be3075@yb1404 ~]$ oc logs pod/pulp-worker-7c9cddbfb-5mbsw --previous
Waiting on postgresql to start…
Postgres started.
Checking for database migrations
error: Failed to initialize NSS library
Database migrated!
error: Failed to initialize NSS library
pulp [None]: pulpcore.tasking.entrypoint:INFO: Starting distributed type worker
pulp [None]: pulpcore.tasking.worker:INFO: New worker ‘1@pulp-worker-7c9cddbfb-5mbsw’ discovered
Traceback (most recent call last):
File “/usr/local/bin/pulpcore-worker”, line 8, in
sys.exit(worker())
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 1157, in call
return self.main(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 1078, in main
rv = self.invoke(ctx)
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/usr/local/lib/python3.9/site-packages/click/core.py”, line 783, in invoke
return __callback(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/pulpcore/tasking/entrypoint.py”, line 43, in worker
PulpcoreWorker().run(burst=burst)
File “/usr/local/lib/python3.9/site-packages/pulpcore/tasking/worker.py”, line 413, in run
self.sleep()
File “/usr/local/lib/python3.9/site-packages/pulpcore/tasking/worker.py”, line 300, in sleep
connection.connection.execute(“SELECT 1”)
File “/usr/local/lib/python3.9/site-packages/psycopg/connection.py”, line 891, in execute
raise ex.with_traceback(None)
psycopg.OperationalError: consuming input failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

##################################################################################################################

Worker -2 :

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/usr/lib64/python3.9/multiprocessing/process.py”, line 315, in _bootstrap
self.run()
File “/usr/lib64/python3.9/multiprocessing/process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
File “/usr/local/lib/python3.9/site-packages/pulpcore/tasking/_util.py”, line 156, in perform_task
execute_task(task)
File “/usr/local/lib/python3.9/site-packages/pulpcore/tasking/tasks.py”, line 54, in execute_task
contextvars.copy_context().run(_execute_task, task)
File “/usr/local/lib/python3.9/site-packages/pulpcore/tasking/tasks.py”, line 78, in _execute_task
task.set_failed(exc, tb)
File “/usr/local/lib/python3.9/site-packages/pulpcore/app/models/task.py”, line 199, in set_failed
rows = Task.objects.filter(pk=self.pk, state=TASK_STATES.RUNNING).update(
File “/usr/local/lib/python3.9/site-packages/django/db/models/query.py”, line 1206, in update
rows = query.get_compiler(self.db).execute_sql(CURSOR)
File “/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py”, line 1984, in execute_sql
cursor = super().execute_sql(result_type)
File “/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py”, line 1560, in execute_sql
cursor = self.connection.cursor()
File “/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py”, line 26, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 330, in cursor
return self._cursor()
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 306, in _cursor
self.ensure_connection()
File “/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py”, line 26, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 289, in ensure_connection
self.connect()
File “/usr/local/lib/python3.9/site-packages/django/db/utils.py”, line 91, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 289, in ensure_connection
self.connect()
File “/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py”, line 26, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py”, line 270, in connect
self.connection = self.get_new_connection(conn_params)
File “/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py”, line 26, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/django/db/backends/postgresql/base.py”, line 275, in get_new_connection
connection = self.Database.connect(**conn_params)
File “/usr/local/lib/python3.9/site-packages/psycopg/connection.py”, line 748, in connect
raise last_ex.with_traceback(None)
django.db.utils.OperationalError: connection failed: FATAL: the database system is in recovery mode

###############################################################################################################

Database pods :

2024-04-18 07:04:54.306 UTC [1] LOG: server process (PID 162302) was terminated by signal 9: Killed
2024-04-18 07:04:54.306 UTC [1] DETAIL: Failed process was running: INSERT INTO “rpm_package” (“content_ptr_id”, “name”, “epoch”, “version”, “release”, “arch”, “pkgId”, “checksum_type”, “summary”, “description”, “url”, “changelogs”, “files”, “requires”, “provides”, “conflicts”, “obsoletes”, “suggests”, “enhances”, “recommends”, “supplements”, “location_base”, “location_href”, “rpm_buildhost”, “rpm_group”, “rpm_license”, “rpm_packager”, “rpm_sourcerpm”, “rpm_vendor”, “rpm_header_start”, “rpm_header_end”, “size_archive”, “size_installed”, “size_package”, “time_build”, “time_file”, “is_modular”, “_pulp_domain_id”) VALUES (‘018ef004c972770c8664e2ea7f46a113’::uuid, ‘java-latest-openjdk-src-slowdebug’, ‘1’, ‘22.0.0.0.36’, ‘1.rolling.el8’, ‘x86_64’, ‘10ca285cee269d505314ac69b5286f38b77d6d87bfadcd7c241cb9a9375a59c7’, ‘sha256’, ‘OpenJDK 22 Source Bundle for packages with debugging on and no optimisation’, 'The java-22-openjdk-src-slowdebug sub-package contains the complete OpenJDK 22
class library source code for use by IDE indexers and debuggers, for packages with debugging on and
2024-04-18 07:04:54.306 UTC [1] LOG: terminating any other active server processes
2024-04-18 07:04:54.306 UTC [164118] WARNING: terminating connection because of crash of another server process
2024-04-18 07:04:54.306 UTC [164118] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-18 07:04:54.306 UTC [164118] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-04-18 07:04:54.306 UTC [162299] WARNING: terminating connection because of crash of another server process
2024-04-18 07:04:54.306 UTC [162299] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-18 07:04:54.306 UTC [162299] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-04-18 07:04:54.307 UTC [161847] WARNING: terminating connection because of crash of another server process
2024-04-18 07:04:54.307 UTC [161847] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-18 07:04:54.307 UTC [161847] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-04-18 07:04:54.307 UTC [160732] WARNING: terminating connection because of crash of another server process
2024-04-18 07:04:54.307 UTC [160732] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-18 07:04:54.307 UTC [160732] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-04-18 07:04:54.307 UTC [160731] WARNING: terminating connection because of crash of another server process
2024-04-18 07:04:54.307 UTC [160731] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-18 07:04:54.307 UTC [160731] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-04-18 07:04:54.307 UTC [160734] WARNING: terminating connection because of crash of another server process
2024-04-18 07:04:54.307 UTC [160734] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-18 07:04:54.307 UTC [160734] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-04-18 07:04:54.307 UTC [160735] WARNING: terminating connection because of crash of another server process
2024-04-18 07:04:54.307 UTC [160735] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-18 07:04:54.307 UTC [160735] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-04-18 07:04:54.308 UTC [161727] WARNING: terminating connection because of crash of another server process
2024-04-18 07:04:54.308 UTC [161727] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-18 07:04:54.308 UTC [161727] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-04-18 07:04:54.406 UTC [160707] WARNING: terminating connection because of crash of another server process
2024-04-18 07:04:54.406 UTC [160707] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-18 07:04:54.406 UTC [160707] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-04-18 07:04:54.408 UTC [164122] FATAL: the database system is in recovery mode
2024-04-18 07:04:54.409 UTC [164123] FATAL: the database system is in recovery mode
2024-04-18 07:04:54.410 UTC [164124] FATAL: the database system is in recovery mode
2024-04-18 07:04:54.507 UTC [164125] FATAL: the database system is in recovery mode
2024-04-18 07:04:54.508 UTC [164126] FATAL: the database system is in recovery mode
2024-04-18 07:04:54.513 UTC [164127] FATAL: the database system is in recovery mode
2024-04-18 07:04:54.607 UTC [164128] FATAL: the database system is in recovery mode
2024-04-18 07:04:54.707 UTC [164129] FATAL: the database system is in recovery mode
2024-04-18 07:04:54.709 UTC [164130] FATAL: the database system is in recovery mode
2024-04-18 07:04:54.712 UTC [1] LOG: all server processes terminated; reinitializing

Also db pods says
2024-04-18 09:26:46.307 UTC [1325] ERROR: relation “core_artifact” does not exist at character 28
2024-04-18 09:26:46.307 UTC [1325] STATEMENT: SELECT count(pulp_id) FROM core_artifact WHERE sha224 IS NULL
2024-04-18 09:26:47.327 UTC [1327] ERROR: relation “core_artifact” does not exist at character 28
2024-04-18 09:26:47.327 UTC [1327] STATEMENT: SELECT count(pulp_id) FROM core_artifact WHERE sha224 IS NULL
2024-04-18 09:26:50.137 UTC [1330] ERROR: relation “core_artifact” does not exist at character 28
2024-04-18 09:26:50.137 UTC [1330] STATEMENT: SELECT count(pulp_id) FROM core_artifact WHERE sha224 IS NULL
2024-04-18 09:26:50.176 UTC [1329] ERROR: relation “core_artifact” does not exist at character 28
2024-04-18 09:26:50.176 UTC [1329] STATEMENT: SELECT count(pulp_id) FROM core_artifact WHERE sha224 IS NULL
2024-04-18 09:26:50.761 UTC [1333] ERROR: relation “core_artifact” does not exist at character 28
2024-04-18 09:26:50.761 UTC [1333] STATEMENT: SELECT count(pulp_id) FROM core_artifact WHERE sha224 IS NULL
2024-04-18 09:26:53.809 UTC [1350] ERROR: relation “core_artifact” does not exist at character 28
2024-04-18 09:26:53.809 UTC [1350] STATEMENT: SELECT count(pulp_id) FROM core_artifact WHERE sha224 IS NULL
2024-04-18 09:26:55.911 UTC [1352] ERROR: relation “core_artifact” does not exist at character 28
2024-04-18 09:26:55.911 UTC [1352] STATEMENT: SELECT count(pulp_id) FROM core_artifact WHERE sha224 IS NULL
2024-04-18 09:26:56.117 UTC [1354] ERROR: relation “core_artifact” does not exist at character 28
2024-04-18 09:26:56.117 UTC [1354] STATEMENT: SELECT count(pulp_id) FROM core_artifact WHERE sha224 IS NULL
2024-04-18 09:26:56.712 UTC [1356] ERROR: relation “core_artifact” does not exist at character 28
2024-04-18 09:26:56.712 UTC [1356] STATEMENT: SELECT count(pulp_id) FROM core_artifact WHERE sha224 IS NULL