Calling All Task Management Enthusiasts! Thoughts, views and issues of our Pulp instance

I think I can bring new insights to this. There still is a chance a resource is held by another immediate task currently executed on an api-worker so the first api-worker would dispatch it’s immediate task into the tasking queue where it would pile up if all workers were currently busy (with syncs). The chance for this to happen is comparably very low. But we are talking about scaling up after all. And a user dispatching multiple actions to the same repository in fast succession sounds like a common scenario.

OTOH, whenever a task is dispatched from the content app, we will never tell anyone so I don’t see much value in adding immediate execution of tasks in the content app. I’d rather keep the additional load out of there. (This is the thing we have technical difficulties implementing too.)

So given both of them, I see value in implementing PULP-397 as planned. We would basically get this benefit too:

The question remains is the value worth adding the complexity for it?

IMHO, I can’t see the value for adding it now. Maybe some data could justify the complexity for it, but I don’t see it for the moment.
Adding the immediate property to the unblocked tasks metric could help us figure it out.

Anyway, I’m just defending we take a decision based on data here.

2 Likes

Anyway, I’m just defending we take a decision based on data here.

That’s fair.

Just for the records, I’ll add my observations about the relevant metric.

The evidence that could justify this is if the waiting time of unblocked immediate tasks is not always low, that’s it, if when the resource for the immediate task is unblocked, it never have to wait much for a worker to pick it (as its prioritized now).

There is a data quality observation here: that ‘unblocked_at’ field is not 100% accurate on when the task really had its resources unblocked.
The ‘unblocked_at’ field is set when a worker calls a function named ‘unblock_tasks’, but in the current implementation there is no guarantee it will run right after each task finishes (maybe it should?).
Therefore, its possible that in a busy system there in a significant untracked time between when a task holding a resource finishes and when a worker is able to update the ‘unblocked_at’ records.

Well not exactly a guarantee, because consistency and resiliance is more important than individual task performance. But whenever a worker (that is not scheduled for shutdown) finishes a task that held some resource, it issues a new unblocked recalculation. I think that is in all cases close enough.

1 Like