Modules.yaml file in the repodata is broken

We are running pulp 3.22.25, pulp-rpm 3.19.3 and sync from upstream repo Index of /compute/cuda/repos/rhel8/x86_64. Got the below error when run yamlint on modules.yaml file downloaded from pulp
8460:81 error line too long (105 > 80 characters) (line-length)
8550:1 error syntax error: expected ‘’, but found ‘’ (syntax)
This causes “dnf module list” to fail. Does anyone know what causes this and if there is fix?

The line in the existing mirrored yaml that is broken is the missing — after … and before document: modulemd here on L8550:

document: modulemd
version: 2
data:
name: nvidia-driver
stream: latest
version: 20241127080226
context: a098daef1e
arch: x86_64

modules.yaml is owned/created by the upstream you’re syncing from. Especially with mirror-sync, you’re getting whatever the upstream is giving you - if it’s broken, you need to open an issue with the upstream-remoite-provider.

As an experiment, I just sync’d that repository. It worked, and yamllint of the resulting modules.yaml complains about lots of too-long-lines, but not that final error. I will note that the current modules.yaml.gz in that repo is https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/repodata/ff8feb8f1607f0c7bd31388290c1a921dee9b8bd1c513bc8f6fea4462f99e5ee-modules.yaml.gz , dated 2024-12-03 16:10 . Looks like they fixed it.

I am still getting the same exact error from the latest sync. The upstream yaml file actually has a warning:

yamllint ff8feb8f1607f0c7bd31388290c1a921dee9b8bd1c513bc8f6fea4462f99e5ee-modules.yaml

ff8feb8f1607f0c7bd31388290c1a921dee9b8bd1c513bc8f6fea4462f99e5ee-modules.yaml
1:1 warning missing document start “—” (document-start)
11:81 error line too long (144 > 80 characters) (line-length)

I am wondering if this cause the problem when pulp generated the modules.yaml.
I also tried the older repository versions that we synced before. All of them have the same syntax errors but the errors are on different lines for different versions.

On a mirror-sync, pulp doesn’t generate that file - it takes it straight from the upstream. That’s what “mirror” means. Getting that file directly from the upstream shows the same checksum, with the same error.

This isn’t a Pulp problem - it’s the upstream repository’s module.yaml that is the source of these errors.

Thanks Grant for your response regarding the comparison of the upstream and pulp files. I understand that both files should theoretically be identical, however, I’ve noticed that although the content appears the same, its order is shuffled in the pulp version. I have new xxx-modules.yaml created after creating a publication after sync. I downloaded xxx-moudles.yaml from both upstream and pulp to compare them.

-rw-r–r–. 1 root root 423986 Dec 3 11:10 ff8feb8f1607f0c7bd31388290c1a921dee9b8bd1c513bc8f6fea4462f99e5ee-modules.yaml
-rw-r–r–. 1 root root 423985 Dec 3 14:52 09c2b25ab6e73bc5f53dfe95686599b7b66ca3969fb4bc2e39361e62c2104c39-modules.yaml

The upstream modules.yaml missing the document-start at the beginning file. The file is still valid. The pulp modules.yaml seems shuffled the content of upstream file. The document-start is missing in the middle of the file which causes an syntax error. The issue still caused a missing document start from upstream.

Could this shuffling affect how we should view the files as ‘the same’? Is the reordering of content expected, or should we look into why this discrepancy is happening? I want to ensure we’re on the same page and understand whether this variation is acceptable or if it indicates a deeper issue that needs addressing.

Thanks for clarifying this for me.

You shouldn’t be publishing a mirror-complete repository - “mirror” means “use the data from the upstream remote”. It auto-creates a publication using that metadata. In my test, I sync’d --mirror that remote, and then distributed the repo, and the used curl to get the modules.yaml.gz, and it’s the exact same checksum as upstream. That’s what you want with --mirror.

That being said - you may be getting bitten by Sync / Publish is broken for mercurial repo due to modules.yaml · Issue #3121 · pulp/pulp_rpm · GitHub. That’s fixed in rpm/3.19.5+ and 3.20+.

You’re on Very Very Old Versions of pulpcore and plugins. Even the products which rely on pulpcore/3.21 went EOL at the end of last month.

Grant, Thank you for your guidance on this issue—it worked perfectly! I really appreciate your help.
Will the distribution automactically point to the new repo version next time I mirror the upstream or I need update the distribution for every new update?

1 Like

Pulp lets you point a Distribution at a specific Publication, if you want to have control over when the user sees new content. Or, if you point the Distribution at the Repository itself, then that Distribution will always serve content from the most-recent Publication in that Repository.

So if you point your Distribution at the Repository and always do mirror-sync (which is effectively an auto-publish with the upstream metadata) - the user will automatically get whatever the latest sync has. Enjoy!

1 Like

Thanks for the clear explanation! This really helped. I have another question about handling previous versions of repository. When I need to access a previous version of repository, should I create a new publication for it or can I use the metadata directly from the upstream as it existed at that particular time?

Publications are content just like a Package is. Every time you do a mirror-repo-sync from an upstream Remote, you get a new RepositoryVersion, with a new associated Publication from the upstream metadata.

To make a prev-repo-version visible to your users, you could create a Distribution with a base_path something like /YYYY-MM-DD, say, and then point that Distribution at the Publication tha is in the RepositoryVersion you want to make avaiable.