Calling All Task Management Enthusiasts! Thoughts, views and issues of our Pulp instance

Yay folks! Hey, we’ve been running our instance for some time and we would like to share some info and ask for some collaboration in the discussion of some items.

Before anything, our Pulp instance runs on a Openshift/Kubernetes cluster, and it manages almost 20TB of artifact data, our DB uses 1.5 TB at least, and currently we run 24 pulp-workers, with an intent to scale it to 48 workers. I can’t say this is the biggest Pulp instance in the world, but we’re on the run for it. :sweat_smile:

After some heavy load tests and gathering some data, we had some impressions about the tasking system:

  1. Upper limit of how many workers we can start (assume completely idle system)
    1.1 Limited by the number of sessions opened to the database
    1.2 Limited by the number of heartbeats that can be written before workers start timing out.

  2. Upper limit of how concurrency of task dispatching / insertion
    2.1 Throughput is limited by this lock acquisition.

  3. Upper limit of tasks we process (assuming infinite workers)
    3.1 Limited by the architecture of the advisory lock concurrency

  4. Cost due to memory footprint
    4.1 Workers need a huge amount of memory. We set the base memory as 2GB, and the upper limit to 6GB.
    4.2 Things that need that upper limit are not garantee to run, it could be killed by the scheduler.

  5. Not clear which feature set for immediate task scheduling we are using now
    5.1 Which of our tasks are immediate ones? Where and how those immediate tasks run?

  6. Tasking logging is not exactly a standard
    6.1 Task logs doesn’t necessarily have a task id, so it’s hard to correlate log entries.
    6.2 It’s not clear what the tasking system started to run or completed using the logs.

Creating a Tasking working group to discuss those things would be great.

So, I want to hear from you:

  • Do you share some of those impressions?
  • Would you want to work on any of those points?
  • Do you have any insight about it?

PS: Feel free to ask for more data if it could support the discussion here. We’ll work to get it ASAP.

1 Like

About immediate tasks:

Where they are used

The ones I know of in pulpcore are:

  • API: the update and remove async mixins, which are used in some viewsets (repository, distribution, acs and exporter)
  • content-app pull-trough (which caused the recent regression).

Also, Maven and Container plugins use it in specific endpoints.

Reference:

How are they used

This is defined in the dispatch call, which can receive immediate and deferred arguments.
These names are atually a bit confusing. They existed before we decided to do short-task optimization.

  1. The immediate task property really means short_task now
  2. The exeuction is immediate (on api/content worker) if task is “short”, the “defered” argument is not True and the resources are available at dispatch-time.
    • In this case, Task.worker=None (can use that fact to filter where task is executed)
  3. The execution is deferrd otherwise:
    • In this case, Task.worker!=None

This distinction is relevant, because now we do have some optimizations for short task running in a task worker. Also some plugins have special requirements, like this usage in Maven where the task should fail if it can’t be executed in the Api worker.

TLDR:
A task can be of kind ‘long’ or ‘short’ (actually marked in the task as ‘Task.immediate=True’).
If its short, it can have its execution ‘imediate/deferred’, depending on dispatch args and resoures.

Other considerations

Some immediate deletes are disabled because:
“Too many cascaded deletes to block the gunicorn worker.”

Hey folks, thanks for sharing these problem statements.
Having such insights from a big production environment is quite valuable.

From the last meeting I think I understand problem (4) better and I had some thoughts about it.
What I heard is that admins running their instance needs more knobs to handle resource more efficiently, and that ideally we shouldnt try to make Pulp very smart about it (e.g, not make Pulp take decisions on what efficient resource management is).

Trying to put the problem a bit out of Pulp, my idea is introduce an optional feature of “Worker Groups” as an opt-in in the deployment/installation level. Given it is enabled:

  • All API, content and task workers must have a group assigned at deployment/installation level
  • Groups are arbitrary label strings
  • Dispatching a task will mark it with the group of the component that dispatched it
  • A worker only pick tasks that belong to its group

Given that capability, an admin could setup how many groups they need and specify routing rules to redirect any given request to the appropriate worker group. These rules could include url patterns, special headers or what not.

That approach kind puts all the complexity of handling the resources in the admins hands (and out of Pulp’s core), but I feel like that is what services is asking for. Does that make sense?

1 Like

Yay @pedro-psb, thanks for the contributions man.
About your insights, I believe they would be a great contribution to the Tasking system.

After the latest changes to the task logs, we were able to gather some data, and that can contribute to your insight.

Check this graph:

Here we have all tasks processed by the pulp-worker, and categorized as immediate or non-immediate tasks. Immediate tasks in this context are tasks that were supposed to be executed on pulp-api or pulp-content context and weren’t able to, possibly by not being able to reserve some exclusive resource.

Those short/quick tasks ends in the queue waiting for long-running tasks to finish for then being able to be executed.

Some way to execute those tasks ASAP would definitely help pulp to be more responsive.

1 Like

We can’t break the “lock promise” - if an “immediate” task needs access to something held with an exclusive lock, they do have to “just wait” until the lock-holder is finished with it.

2 Likes

For sure @ggainey, we are not discussing about abandoning the lock promise. Yet, those short tasks get to the queue and wait for long running tasks to finish for then to be executed. We’re thinking if it’s possible, somehow, for them to be executed with special runners, or in the main thread, or other possible idea.

1 Like

So just to clarify what I think we’re aiming for here, a scenario something like this:

  • 3 workers
  • 6 tasks in-queue, (t1-t6), t1 locks r1, t2-t6 do not need r1 9nor any competing locks, just to keep things simple)
  • “immediate” task t7 starts, needs r1, goes into queue
  • t1 finishes, r1 is available
  • currently: t7 has to wait for t2-t6 to be assigned and then have a free worker before executing
  • proposed: As soon as t1 completes/unlocks, re-evaluate the queue for “queued immediate tasks” and see which ones can now execute and run them immediately - so t7 can run without having to wait for t2-t6 to be assigned to workers/complete

Is that (very roughly and abstractly) what we’re looking to achieve?

1 Like

After all, I don’t know where exactly the installation giving those results is, but we already execute immediate tasks (once their resources are unblocked) before all not-immediate tasks.
If I understand correctly, the tasks running in the api, because their resources were already available aren’t even part of that graph.

What you described (short task prioritization) is implemented already to some extend.
The second step we are discussing is running immediate tasks concurrently in the main worker process.
https://issues.redhat.com/browse/PULP-397

1 Like

What you pointed is our current state @ggainey.

But in a case where all workers are busy with long running tasks, those unblocked immediate tasks need some worker to be freed, for then to be executed.

THIS! :point_up:

And how much did it improve the situation?

BTW we should reevaluate the “running parallel to a sync” idea carefully, as per this discussion, we established the only legit way for an immediate task to get in to the queue for the workers is to be blocked on some resource. So one worker needs to finish its task to even create the next situation where a task could be unblocked. That same worker will handle all the immediate tasks right away before diving into the next heavy task.

Really the next step is to work around an async limitation to properly handle tasks within the content app.

1 Like

It would be nice if we could have a controlled and repeatable test for this and gather metrics about average waiting time for immediate/short dispatched tasks. That would enable us to do some fair evaluation of the impact of our proposed changes.

The graph were generated with log data from the latest pulpcore version @x9c4. 3.76.1 at the moment I’m writing this.

Good that you mentioned it. It’s on our plans to add the run time of a task to the log, but also the waiting time in the queue is also amazing. We can work on some calculation using that.

I’ve concluded exactly the same to what I see written from @x9c4, specifically “we should reevaluate the “running parallel to a sync” idea carefully, as per this discussion, we established the only legit way for an immediate task to get in to the queue for the workers is to be blocked on some resource.”

I think this is exactly the situation we’re in at this point, and it would do literally no good to finish https://issues.redhat.com/browse/PULP-397. All “unblocked tasks” are now handled in the API and if they aren’t they won’t execute any faster because they are waiting on what a worker is working on resource-wise. I believe that means we should close https://issues.redhat.com/browse/PULP-397 Please tell me if you’re seeing the same thing or not.

That makes sense to me. I had this intuition before but was not confident about it’s correctness.
I’ll just re-state the same here with my own words.

  1. A non-immediate short task dispatch (which runs on a task worker) necessarily means (as it is today) that it’s blocked by a resource, lets say R.
  2. The best we can do about it is to ensure that the short task is picked first after that R is released.
  3. In the worst case, all workers are busy and one of them holds R. When it finishes and releases it, the next avalable tasks to be picked must be this short tasks (or all unblocked short tasks at that time).

While we already implemented prioritization, I believe we don’t comply with (3) yet, as the worker could not know about this short task when the blocking one finishes.E.g,

  • At time t0, handle_tasks fetches a list Q (of unblocked tasks) from db
  • It enters a loop to execute then one by one.
  • At time t1 a short task enters the system (db). This task is not in Q, as we don’t fetch in-between.
  • The task blocking the short tasks finishes, but the worker will pick the next one in Q (there might be some long ones there).
  • If all other workers are busy for all that time, only after Q is empty that the short task will have a chance of executing.

My conclusions:

  • The “run on foreground” idea could be a way to ensure that the short task is picked as soon as possible in all cases, but it might be an overkill.
  • It’s not clear if this edge-case happens often enough that addressing it would cause a noticeable impact. IHO the metric that could give some insight about is the duration that a short tasks waits after the resources it needs are released. If that’s usually high, then it makes sense to adress it.
  • One obvious way to enure (3) is to increase task polling frequency, but that might impact db in unintented ways.
2 Likes

My point was that the very one worker releasing the resource “R” by finishing its current task will now take a new look at unblocking tasks and then take a spin on all immediate flagged tasks. So I’d expect it to find that very one task amongst others.
Of course that wouldn’t happen if the worker is marked for shutdown. But that’s a different story.

1 Like

Folks, I’ve also create an issue about adding task related data to its finished log entry, with the addition of the execution time on it.