Performance working group

Hey folks - we’ve had the community raise a number of performance-related questions recently, and have spun up a working group to investigate and start addressing them. We’ll be meeting every Monday at 1000 GMT-4. Contact Gerrod if you’d like an invite!

We’ll be posting minutes in this thread; they’re being taken here on hackmd

Current Priority List

2023-07-17 1000 GMT-4

  • attendees: ggainey, gubben, lmjachky, dalley
  • agenda
    • Decide on weekly meeting schedule
      • consensus vote says “this time slot works”
      • AI: gerrod to schedule weekly
    • Add any new performance issues under ‘Performance’ label
    • Go over Priority list, assign work
    • discussion about how to approach this effort
      • need to have a well-defined baseline
        • prob in 3.24/3.25/main? (for #3970)
        • measure a well-defined set of REST calls for each
      • for repository-query (#3969) - Repository._content_relationships()
        • prob is the same prob for all versions - need a pre/post-FIX measurement
    • AI: all - take 20 min before next mtrg to review/triage “Performance” labelled issue in core
    • AI: ggainey to post a discourse thread
  • AIs
    • AI: gerrod to schedule weekly
    • AI: all - take 20 min before next mtrg to review/triage “Performance” labelled issue in core
    • AI: ggainey to dig out where initial perf-discussion happened
    • AI: gerrod takes lead on #3970
    • AI: lmjachky takes lead on #3969
    • AI: dalley continues lead on #2250
    • AI: ggainey to post a discourse thread
6 Likes

2023-07-24 1000 GMT-4

  • attendees: ggainey, dalley, lmjachky, gubben
  • regrets:
  • Prev AIs:
    • DONE AI: gerrod to schedule weekly
    • AI: all - take 20 min before next mtrg to review/triage “Performance” labelled issue in core
    • DONE AI: ggainey to dig out where initial perf-discussion happened
    • AI: gerrod takes lead on #3970
    • AI: lmjachky takes lead on #3969 (contents-from-repo-version)
      • working w/ originators to get perf-testing scripts to use
      • discussion ensued
      • NEW AI: lmjachky needs to get more info about SQL queries run inside Django
    • DONE AI: dalley continues lead on #2250 (memory-growth)
      • “number of queries” question post-fix
      • pinged originator (gmbnomis) on their tests in pulp_cookbook that exposed the original problem
  • Agenda
    • discuss issues from prev-AI
  • AIs:
    • AI: all - take 20 min before next mtrg to review/triage “Performance” labelled issue in core
    • AI: lmjachky needs to get more info about SQL queries run inside Django
    • ggainey to post minutes to Performance working group
2 Likes

2023-07-31 1000 GMT-4

  • attendees:gerrod, lmjachky, dalley
  • regrets:ggainey
  • Prev AIs:
    • AI: all - take 20 min before next mtg to review/triage “Performance” labelled issue in core
    • AI: lmjachky needs to get more info about SQL queries run inside Django
  • Agenda
    • Performance labelled issues
    • lmjachky’s SQL query investigation (https://github.com/pulp/pulpcore/issues/3969#issuecomment-1652531793 +)
      • Used DEBUG=True and explain() to see the queries running under the hood:
        • no significant differences like complex joins between the v.get_content() (faster) and v.get_content(packages) (slower) queries:
          • just selection of more fields + a loop with one iteration in explain
        • worth rewritting the get_content query from scratch
    • 3.25 performance update:
      • Fairly confident majority of slow down is from Basic Auth Changes in django 4.2
      • Slight bump when domains was introduced in 3.23, but consistent response times from 3.24->3.28 when auth is removed
  • AIs:
    • lmjachky will start optimizing the repo_version.get_content(plugin.Model) (baseline with pulp_rpm) query to get better results in general (no longer focusing on pulp_ansible performance)
    • gerrod to measure times using session auth & investigate performance regarding DRF web renderer
    • gerrod to post to discourse Performance working group
1 Like

2023-08-07 1000 GMT-4

  • attendees: gerrod, lmjachky, dalley, tsanders
  • regrets: ggainey
  • Prev AIs:
    • lmjachky will start optimizing the repo_version.get_content(plugin.Model) (baseline with pulp_rpm) query to get better results in general (no longer focusing on pulp_ansible performance)
    • gerrod to measure times using session auth & investigate performance regarding DRF web renderer
      • Session auth test resulted in no performance difference across versions
  • Agenda
    • Performance improvement around publications in pulp_rpm
    • Suggestions for lmjachky’s query testing
      • Contact ipanova to use COPR machine to test
      • Use potentially slow seq scans in explain statement as a guide for DB model changes
    • Sync stage query for existing content dominates sync pipeline for re-sync tasks
      • Potentially could create a large cache of previous version’s content to check against
      • Only use content’s natural uniqueness fields to make cache as small and fast as possible
      • Probably would require some refactoring of the stage’s pipeline as it doesn’t have knowledge of the previous repo-version
  • AIs:
1 Like

2023-08-14 1000 GMT-4

  • attendees: gerrod, lmjachky, dalley, ggainey
  • regrets:
  • Prev AIs:
  • Agenda
    • Performance improvement in content app ready for review
    • ACS artifact stage improvements
      • Need someone familiar with ACS to sanity check the changes I’m making
      • https://github.com/pulp/pulpcore/pull/4274
      • needing to hydrate the RemoteArtifact is “unfortunate” from a performance POV
        • gerrod to take a look and think about stages-use
      • dalley still working on tests/perf-analysis
    • Resolution for repo_version.get_content():
      • https://github.com/pulp/pulpcore/pull/4275
        • Collaborated with ipanova, tested the performance on a machine with 112k repositories → got good results
        • Do we need to touch the DB schema if the improvements were significant?
          • no, please - keep this very backport-able
        • Should we do an output-comparison between original/modified query against a COPR(ish) repo to make sure we get the same thing?
          • yes please
  • What should lmjachky look at next?
    • maybe, nothing specific based on the comments below
  • More generally - what future work do we want this group to work on?
    • implications of immediate-sync an upstream on-demand remote - can completely overload the upstream content-app
    • https://github.com/pulp/pulpcore/issues/3549
    • Q: are we actually at a point where we declare this working group “done”?
  • What about “automated performance tests” as part of CI?
    • sounds like a fine, fine idea
    • There Exist ansible playbooks that run perf-tests against downstream and spit out charts
    • jhutar@redhat.com - invite him to this mtg to talk to us about perf-testing
  • AIs:
2 Likes

NOTE: these are the final minutes from this working group. Pulp certainly has more performance work to do, but this particular iteration was convened to address specific problems, and fixes for them have all been merged. The group has voted to disband for now.

2023-08-21 1000 GMT-4

  • attendees: ggainey, gerrod, lmjachky, dalley, jhutar
  • regrets:
  • Prev AIs:
    • [lmjachky] compare filter-output pre/post #4275
  • Agenda
    • Can we get jhutar here to talk to us about his perf work?
      • biggest issue: someone needs to PAY ATTENTION to the results
      • how do you decide on red/green for a test?
        • need to define a range (for a number of metrics), note when something is “outside” allowed
      • perfteam has an easy process on-demand, but hard to keep same hardware
        • results in “noisy” results
        • as always - exact-same-hardware is important for reliable results-reporting
      • current setup is internal - would be Exciting to try and get results published outside
      • talk to jhutar’s mgt to set priorities
        • AI: [gerrod] to open communications
      • “90% of the work” is “defining the test and running tests reliably”
      • can we work w/ other downstream projects to get reliable access to pulp-hardware?
      • internal OpenStack instance exists
        • if jhutar had a pulp-setup-script and a pulp-performance-test-script, would be pretty straightforward to get something started
      • “ideas are cheap, implementation sucks”
    • Up/Down vote: is this group “done”?
  • AIs:
2 Likes