Hi @woolsgrs,
Here are some info just to keep you updated on the Operator status:
We always quite in the dark with Pulp and have some of our own tools and now looking at what you have with the Pulp Operator, but it would be good to see
- Overall status of each deployment, understand its functioning correctly, capacity metrics around that.
The current version of the Operator provides the .status.conditions[]
field, which can be helpful to get this information. For example, checking this picture we can see that all (api
/content
/worker
) deployments are in a READY state (all of their replicas are running and ready to serve requests):
We are also investigating the possibility of creating Red Hat Insights rules to help with the troubleshooting
- by using the k8s
events
generated by the Operator and/or
- the
.status.conditions
fields
These rules could be used, for example, by the support team to get an overview of the Operator status and check the possible fixes suggested by Insights.
- Performance for triggers knowing when to scale up/down e.g. no of tasks, task waiting etc.
- Content Counts and no. of requests to that content
As soon as we can retrieve these metrics from Pulp, we will start to work on creating k8s HPA
through the Operator (https://github.com/pulp/pulp-operator/issues/761).
- Able to trigger alerts from these metrics
This is something that we didn’t have a deep investigation yet, but we will check the possibility of creating custom OCP monitoring dashboards (which can include alerts) through the Operator.