Refactor of the content app in Django

TL;DR: This may be a rather big effort, but also bring a bunch opportunities.

With Django 4.2 comes the possibility to deploy apps as ASGI instead of WSGI (the “A” stands for “asynchronous” and means “event driven non-blocking single threaded code” in this context). In practice, a single daphne (this is what replaces the “gunicorn” worker) process can handle multiple hundreds of requests simultaneously even if they individually take rather long, as e.g. streaming an artifact to the user. So far this streaming need is what warranted the original design choice to use aiohttp for the content-app, while the pulp-api still ran as a native Django app. But this mix of technology (the content app still needs to access the Django ORM and storage technology in non-trivial ways) is a constant source of friction. The most prominent example being the db-disconnection bug chasing us over years, we may or may not have solved by now, but certainly with a lack of confidence.
The proposal I want to outline here pursues the goal to bring the content app back onto the same technology stack as the rest of the Pulp components.
The benefits I see from the top of my mind are:

  • The ORM fits naturally into the request handling.
  • Exceptions flow naturally between the abstraction layers without translating.
  • One less HTTP stack to keep in mind.

As much as the envisioned result looks like the promised land, there are significant challenges on the way:

  • You can only run either technology in the same process.
  • Plugins have the opportunity to hook into the content app for so called live APIs.
  • You can most certainly not use the same (plugin-) codepaths for both versions of the content app.

To allow for a smooth transition (including zero downtime upgrades), we could exert a phased approach. We would design the new type content app side by side to the old one (we did a similar thing to replace the tasking system), and ask plugins to start provide equivalent codepaths for both implementations, reporting their availablility with the PluginConfig object. We would then properly deprecate the old implementation and eventually remove it for good.

As for the new implementation, I’d like to see a design, where the main app would route the request up to the distribution, from where the plugin takes over responsibility. For most plugins, Pulpcore should provide convenience classes that simply route pass-through publications or such with published artifacts.

8 Likes

Would this allow us to do more “interesting” things with the web-display the content-app currently supports? We have had a lot of user-interest in improvements to what you get when you hit /pulp/content/ .

1 Like

We could possibly just hook “normal” Django views in there using Django’s templating facilities.

1 Like

I’m in favor of exploring this. It’s a pretty big change so I say “exploring” because we need to make sure the performance and scalability is at least as good. I believe we could do this on the 3.y line since it’s really just a technology change.

For our users, it would be easier to deploy Pulp if there was just one process that could handle both contexts. Not only easier to deploy, but users could allocate more memory per process by using fewer processes, which would be more efficient due to some base overhead each process uses. Also we’d get an openapi schema for the content app too.

For developers I think it would be great too. Redirecting traffic between the content app and the django app can go away. Also developers have to make challenging choices and implmementations to “split” their implementation between the content app and the django api today and all that could be consolidated too.

2 Likes

We’re moving to a model where content will be served by a CDN (content will still be synced in from the content app). So I think just having one server process (for both API and content serving) would work for us and perhaps even benefit us since we’ll be reducing our reliance on the content app.

1 Like

That was actually one step further than my proposal. I wanted them to get onto the same technology stack first. Merging the api and content worker would mean to transform the api into an asgi app too. This may or may not be trivial. But given that e.g. the registry api is torn between both apps it sounds like a fine secondary goal.

I am linking the associated GH issue: https://github.com/pulp/pulpcore/issues/3928.

1 Like