Fixing a pulp_rpm 20 char DB field limit with backports

I would like to discuss this pulp_rpm issue (first brought it up in open floor, then the pulp-rpm matrix room, and am now moving it here for long term discussion and the silent readers).

In a nutshell, we have a SLES repo, that references a custom metadata file from its repomd.xml, with <data type="license-sle-module-NVIDIA-compute">. The string "license-sle-module-NVIDIA-compute" is too long for the 20 chars limit on the DB field here. The corresponding Pulp 2 field had no such 20 char limit.

The result is that the repo in question can’t be synced to Pulp 3 or be 2to3 migrated from Pulp 2.

The obvious solution is to relax the 20 char limit, but I also need a fix as far back as pulp_rpm 3.9 (for Katello 2to3 migrations). I will collect some of the ideas that were discussed as possible solutions:

  1. The really ugly hack: I simply learn to live within the 20 char limit by truncating the offending strings. Even more ugly, this could be made robust against collisions by hashing and then truncating the strings instead.

  2. @x9c4 asked the following: “Can the length of that field be adjusted in a post_migrate signal until the version that has the migration?” However the consensus was that there are not yet any examples of trying something like that, and that it might not be possible using that mechanism.

  3. ?!

We don’t have any existing examples of changing a column in the post_migrate signal however i would not take this a no go option. Signals | Django documentation | Django It seems like it is not advised to have any schema alteration in the post-migration signal or any column type change. Apparently this change does not fall under either of those restrictions.

Actually this would be a column type change :confused:

pulp=> \d rpm_repometadatafile
                   Table "public.rpm_repometadatafile"
     Column     |          Type          | Collation | Nullable | Default 
----------------+------------------------+-----------+----------+---------
 content_ptr_id | uuid                   |           | not null | 
 data_type      | character varying(20)  |           | not null | 
 checksum_type  | character varying(6)   |           | not null | 
 checksum       | character varying(128) |           | not null | 
 relative_path  | text                   |           | not null | 
Indexes:
    "rpm_repometadatafile_pkey" PRIMARY KEY, btree (content_ptr_id)
    "rpm_repometadatafile_data_type_checksum_relat_c9d7364a_uniq" UNIQUE CONSTRAINT, btree (data_type, checksum, relative_path)
Foreign-key constraints:
    "rpm_repometadatafile_content_ptr_id_68dd2541_fk_core_cont" FOREIGN KEY (content_ptr_id) REFERENCES core_content(pulp_id) DEFERRABLE INITIALLY DEFERRED

pulp=> 

Here is some hacky way to do that https://web.archive.org/web/20111007112138/http://sniptools.com/databases/resize-a-column-in-a-postgresql-table-without-changing-data

2 Likes

I have a working PoC that changes the data type to text using a post_migrate signal:

DISCLAIMER: The Django docs explicitly say the following:

Handlers of this signal must not perform database schema alterations as doing so may cause the flush command to fail if it runs during the migrate command.

Which I have thoroughly ignored.

I have a new and improved version of the “backportable migration fix” here: Comparing pulp:3.17...ATIX-AG:migrationless_migration_poc · pulp/pulp_rpm · GitHub

We will be using this version for internal builds, because it satisfies our needs.

I am assuming this is not something we want to merge to any upstream pulp_rpm branches. However, I will leave the above branch on the ATIX fork, in case anyone is interested in discussing what I did here (as an academic exercise/for future reference/as a PoC).

Note: If anyone is tempted to look at it, it helps with comprehension to look at the two commits separately.