Squashing migrations

We have discussed the idea of squashing migrations. And came to the preliminar conclusion, that we may be able to squash migrations once per project, but will never be able to delete squashed migrations, as that will break some upgrade paths.
Sadly django does not (yet?) support squashing squashed migrations. [0]
So the current working idea is that we want to take the advantage of a big squashed migration now and maybe never do it again. This builds on the perception, that we have had a lot of erring our ways in the early days of Pulpcore 3.Y and that new migrations are becoming increasingly rare.

The discussion has been captured here:

[0] https://github.com/django/django/pull/14380

1 Like

I hope, we can use telemetry to gather information about how old systems in the wild are and whether we are actually able to squash and delete migrations after all.

What is the expected speed up for applying all migrations on a new installation?
Is this really worth the pandoras box you are opening.

Having (on rare occasions) back ported migrations to older versions on our internal build we are generally not looking forward to dealing with this.

Is the proposal here to squash migrations before each release? So if pulpcore 3.Y has 3 new migrations, squash them before releasing, or is the proposal to squash migrations from older releases (say <3.05 for example).

The proposal is to squash a considerable amount of existing migrations starting at 0000_initial so that new installations can skip all the erring around tables and fields that were created to be deleted again together with the complicated data migrations there. But we would not delete any migration so we do not break any existing upgrade path.

Does this mean both the squash migration and the original migration set would be present in the repository (at least for the foreseeable future)?

And I am still genuinely interested in how much of a speed up is expected from this for a first time Pulp installation?

Yes, it does. The sad part is that we cannot remove dead code that only supports the old migrations, and that we cannot remove the squashed migration again and therefore never squash again.

I’m also interested. Trying to find time to do some measurements.

With file, container and deb plugin running time pulpcore-manager migrate on an empty database:

All “main” branches:

real 0m17.820s
user 0m12.998s
sys 0m0.283s

pulpcore migrations squashed up to 0076:

real 0m14.146s
user 0m10.005s
sys 0m0.237s

pulpcore migrations squashed up to 0076, file to 0014, container to 0032 and deb to 0019:

real 0m10.297s
user 0m6.011s
sys 0m0.249s

Disclaimer: This is not a solid experiment, but just a single datapoint for each case.

2 Likes

The cost/benefit here isn’t adding up for me. I perceive the effort and risks are significant and the benefits are slim. These numbers are small times paid once by users, and even as developer overhead it’s not a lot of wall clock time. If you still want to go forward with it, you’re confident nothing bad will happen, and are willing to handle any outcomes that do happen, feel free.