After having played a bit with monitoring and analyzing the content app (see Questions about scaling the content app),
I decided to come back to the API, as it is clearly the biggest offender in terms of memory today.
I have a very basic Katello test setup (pulpcore 3.22.2, pulp-rpm 3.19.2) on a machine with 32GB RAM and 4vCPU.
(No, that’s not the same as the above setup, but here I can go crazy without affecting other peoples deployments).
It has CentOS 7 (Base, Updates, Extras) and CentOS Stream 8 (BaseOS, AppStream) repos synced.
To generate some load, I am using the following, rather simple API script.
It is not using the official bindings so I can later more easily port that over to locust or something.
import urllib.parse
import requests
LIMIT=2000
BASE_URL='https://katello.example.com'
API_URL=f'{BASE_URL}/pulp/api/v3'
class PulpAPI:
def __init__(self):
self.session = requests.Session()
self.session.cert = ('/etc/foreman/client_cert.pem', '/etc/foreman/client_key.pem')
self.session.verify = True
def request(self, url):
r = self.session.get(url)
r.raise_for_status()
return r.json()
api = PulpAPI()
repositories = api.request(f'{API_URL}/repositories/?limit=1000')
while True:
for repo in repositories['results']:
print(repo['name'])
version = repo['latest_version_href']
query = {'arch__ne': 'src', 'fields': 'pulp_href,name,version,release,arch,epoch,summary,is_modular,rpm_sourcerpm,location_href,pkgId',
'limit': LIMIT, 'offset': 0, 'repository_version': version}
query_s = urllib.parse.urlencode(query)
url = f'{API_URL}/content/rpm/packages/?{query_s}'
while url:
x = api.request(url)
url=x['next']
TL;DR for those who do not want to read Python: it fetches the first 1000 repositories known to Pulp and then in fetches details about all non-source RPM packages in those repos until interrupted.
It’s probably broken for setups that have non-RPM repositories, but I don’t care for now
The API is configured with --max-requests 50 --max-requests-jitter 30
.
With 5 API workers (Katello default for that size of a VM), running this script for ~6 minutes makes the API memory consumption jump to 26GB in a total of 150 requests.
150 requests, across 5 workers is 30 requests per worker, which means no restarts happened.
Shortly after that, the kernel started killing gunicorn because of OOM. Well, let’s define 26G as the maximum for now,
but keep in mind that this box is totally idle otherwise and you wouldn’t be getting that much memory when Katello is actually processing the returned data.
I’ve then started experimenting with the number of workers (alternating 5 and 3) and the number of requested records per call (LIMIT
: 2000, 1000, 500).
Katello does 2000 per default, as fetching things in multiple smaller requests takes more time (see https://github.com/Katello/katello/pull/7862).
Interestingly, even with 3 workers, it still managed to get to 26G quite quickly, without any restarts.
The best result was with 3 workers and LIMIT=500
– the restarts managed to keep the usage below 9GB.
However, that’s still a lot (and would have probably still result in OOM on a box that has an active Katello),
and also not really a fix as I as the restarts are highly dependent on the number of requests and not on the size of the worker.
Looking at https://github.com/benoitc/gunicorn/issues/1299 and https://github.com/benoitc/gunicorn/blob/master/examples/when_ready.conf.py,
I decided to try actually looking at the memory footprint and to kill workers “on demand”.
# based on https://github.com/benoitc/gunicorn/blob/master/examples/when_ready.conf.py
import signal
import threading
import time
import psutil
max_mem = 2048
class MemoryWatch(threading.Thread):
def __init__(self, server, max_mem):
super().__init__()
self.daemon = True
self.server = server
self.max_mem = max_mem
self.timeout = server.timeout / 2
def memory_usage(self, pid):
return psutil.Process(pid).memory_info().rss / (1024 * 1024)
def run(self):
while True:
for (pid, worker) in list(self.server.WORKERS.items()):
if self.memory_usage(pid) > self.max_mem:
self.server.log.info(f"Killing worker {pid} (memory usage > {self.max_mem}MB).")
self.server.kill_worker(pid, signal.SIGTERM)
time.sleep(self.timeout)
def when_ready(server):
server.log.info(f"Starting MemoryWatch with {max_mem}MB limit per worker")
mw = MemoryWatch(server, max_mem)
mw.start()
This is using psutil
instead of shelling out to ps
like the example does, as it felt more natural,
but otherwise the idea is the same: have a watcher thread that analyzes the memory usage of the workers
and sends a KILL signal if the usage is over a threshold (I picked 2GB rather randomly).
This happens every 45 seconds, as Katello by default deploys with --timeout 90
.
With 3 workers, and LIMIT=2000
, this keeps the memory usage at ~6GB, which makes sense as workers are getting killed once they reach 2GB.
With 5 workers, and LIMIT=2000
, this keeps the memory usage at ~9GB, which is close enough to 2*5=10
Even better, that’s close to the numbers we’ve seen with 3 workers and LIMIT=500
above, while processing bigger requests.
Is that the silver bullet and are the numbers final? Certainly not.
I will repeat the experiment, maybe even with some parallelization (as Katello does that too) and some more variance in the tried sizes of workers, batch sizes and memory limits.
But it’s certainly a direction that can give us a more stable API services while we figure out how to fix the underlying memory leak.