A typical type of content in Pulp is some sort of software package. And software is usually versioned. But then again each software ecosystem interprets software versions differently.
Pulp would greatly benefit from being able to search for content by versions / version ranges. So moving the version comparison into it’s postgres database including adding indecies on versions would be highly desireable.
Now it occurs to me that most if not all of the intricacies of sorting and comparing versions, are actually solved problems in the arena of localization. Sometimes there are more than one representation of the same letter (æ =ae, ß=ss, þ=th while 01 == 1 for versions). Sometimes combinations of letters are sorted differently than the individual letters (for versions this mainly applies to runs of digits that are supposed to be sorted as numbers).
In fact it turns out that the “icu” locale “en-US-u-kn-true” happens to order semver correctly if used as the collation of the version field.
So I wonder if we could find similar locale settings for other version comparisons too.
I conducted some experiments. And I think we can pull something off with Postgres 16 allowing tailoring rules in the icu collations. It seems to work quite nice for debian versions and I’m willing to try the same for rpm versions.
The solution I am thinking of would need to split the versions by Epoch, version, revision in the database (maybe in a compound field, something that can be created without admin permission) and sort the version and revision according to a specifically taylored collation (again no admin permission required in postgres).
As far as I can see, Btree-indices as well as range lookups come for free with that solution.
Yes a split like this but with sorting rules known to the database.