How to analyze "task performance"

I first brought this up in open floor, but I want to continue here.

I want to ask about general strategies and tooling, that can be used to analyze what long running Pulp tasks are taking the time for.

For the sake of argument, let’s say we are interested in understanding sync times (but it could just as well be the publish or anything else that runs in a task, and takes a long time). The initial sync for some large repository might take something like three hours. How do we break down what Pulp is doing that adds up to those three hours?

Note that for the purpose of this example, I am not assuming that three hours is an unreasonable time for the sync to take. Maybe it is already as fast as it can be. More likely there is something somewhere in this standard sync that could be improved further. How do we go about finding that thing?

Now that I have posed a maximally broad and open ended question, I am hoping some of you will react with concrete tooling, logs, example issues, etc. :wink:

2 Likes

Take a look at the “profiling” section under “task diagnostics” in our docs

https://docs.pulpproject.org/pulpcore/plugin_dev/plugin-writer/concepts/tasks/diagnostics.html#profiling

One note, I seem to recall that maybe it didn’t work as well for syncs as for publishes, due to async messing with the results. In that case you might want to leave it disabled, but copy the lines that enable the profiler into the async part of the sync task - I think the misbehavior has something to do with crossing a sync <-> async boundary.

But you can try it as-is first, maybe they’ve fixed that bug in the last few months.

1 Like

@dralley I tried as-is but some of the sync task profiles look not right to me. Can you explain what you mean with

In that case you might want to leave it disabled, but copy the lines that enable the profiler into the async part of the sync task

Do you mean this part here? Do I have to copy it into each stage of the sync pipeline?

Mind that with profiling enabled there is some memory and execution time overhead Have separate switches for memory and time profiling · Issue #4548 · pulp/pulpcore · GitHub
I also like py-spy dump GitHub - benfred/py-spy: Sampling profiler for Python programs