Poetry as our project dependency manager

Hi folks! Maybe this is of knowledge of some of our community members, maybe not. But we faced a dependency resolution issue a couple of months ago.
One of the libraries we use changed it’s dependencies and pip wasn’t able to find the best scenario for keep everything in order. Our way to solve this at the time was to pin down a specific version of the library and move on.

At the same time, I remembered of my happy times using poetry. It was an answer to some issues(most of it was resolver issues) I had with pipenv. Also, it introduced some nice things like parallel downloads, a single configuration file using pyproject.toml and also build capabilities.

So, with all crumbling down, why not test to see how it could solve the resolve issue?
Here’s some data:

Installing pulpcore into a virtualenv using pip:

time pip install pulpcore/
[...]
real    0m19.447s
user    0m12.406s
sys     0m2.151s

and installing the same projects into a virtualenv using poetry:

time poetry install
[...]
Installing the current project: pulpcore (3.30.0)

real    0m5.283s
user    0m5.365s
sys     0m1.515s

Here is an example of the generated poetry.lock file.

After talking about this in the Pulpcore Meeting, I recovered some discussions and tries about this migration, which can be found here and here

I wanted to bring up this discussion and hear from you all (build gang, I’m talking to you)!
Here are some pros and cons:

Pros

  • Lockfile mechanism.
  • Reproducible builds.
  • Parallel downloads.
  • Supports editable installations (mainly used for development purposes)

Cons

  • I really don’t know how we’ll deal with multiple Pulp version we need to maintain

Some test scenarios:

  • Rebuild our images using poetry.
  • Install pulp using virtualenv environments.
  • Install pulp directly on the environment root.

I would love :heart: to hear from you. Feel free to sugest test scenarios and other things.

2 Likes

Could there be any problems with downstream packaging related to Foreman with this? Moving forward with Pulpcore RPM Packaging - RFCs - TheForeman

How is poetries dependency management different?
The fact that there is a lockfile seems to contradict our need to make pulpcore a library supporting as broad dependency ranges as we can guarantee.

But with the new packaging system over at TheForeman, we can at least start thinking about removing setup.py while keeping setuptools as the build infrastructure.

I don’t see any bad impact, for RPM packaging would be nice to have some sort of lockfile to use as source of truth for a particular release, as @decko mentioned sometimes pip don’t find the best or accurate dependency solving for all packages, I also saw this problem with importlib-metadata when we started to branch 3.28.

1 Like

It uses a different resolution engine for solving package dependencies. I’m still trying to retrieve any history about it. If there’s any benchmark or so…
I just experienced and got some opinions that it’s way better than current pip resolve engine.

About the lockfile, it’s gonna be used just for development. setuptools is our choice to generate the package with the needed metadata, and the last is the one that defines the broad dependency range we want to offer to pulp users.

Just few lines on setup.py to extract data from pyproject.toml and we’re good to go.

I’ll share an example of how the lockfile as an artifact of testing and releasing can be useful for the RPM packaging that Foreman does.

Over in the Foreman ecosystem, when we merge a new commit, a series of tests are run on the source that is to be used as input into packaging. Those tests generate a Gemfile.lock that has all of the dependencies that tests were ran against.

In our packaging repository, we use a Github Action to consume the latest successful lockfile and then compare what is in it to what we have packaged as RPMs. When we find updates, the job automatically creates a pull request bumping the dependency.

By doing this automatically we have been able to keep our dependencies significantly more up to date with a lot less human overhead. This has kept us on top of potential dependency conflicts between core and plugins and most importantly consuming security patches that are landing in dependencies.

For me, that is why an artifact that expresses the set of tested dependencies is incredibly useful.

3 Likes

We’re talking about decisions around lockfiles and using poetry together. Do these decisions have to go together? I don’t know enough about poetry to know.

In terms of lockfiles, I imagined end-users creating them in specific environments. Shipping lockfiles I think is problematic in practice. One problem is that on a given system there are other things installed and those need to be locked too, so a shipped lockfile would be inadequate for the whole environment. Also due to other things installed the specific versions of Pulp dependency X needs to be at version Y and so a shipped lockfile that everyone creates uninstallable situations due to an inability to resolve conflicts the lockfile creates.

Also regarding lockfiles, there are a few things to consider if using lockfiles for dev environments or CI. I believe we want to keep testing forward compatibility in CI and dev environments. Lockfiles more or less hold environments back on versions to what is in the lockfile. That is unless you’re updating them constantly at which point a lot of work is being done to get what you already would have if you didn’t have lockfiles.

One thing I don’t know is: Can poetry just consume the dependencies directly from setuptools? If we can somehow use poetry without long-lived lockfiles to maintain forward compatibility that would be ideal.