Use correct MimeType for APT-repo metadata

It keeps bugging me that plain-text APT/Deb repository metadata is served as application/octet-stream instead of text/plain (especially if you just want to check the content in your browser).

I tried adding the files to python’s mimetypes by doing the following in /pulp_deb/settings.py:

import mimetypes
mimetypes.add_type("text/plain", "Release", True)
mimetypes.add_type("text/plain", "InRelease", True)
mimetypes.add_type("text/plain", "Packages", True)

This adds the mime-types, but python chooses to ignore them. Probably because it only looks at file-extensions and not at full filenames.

So the second thing I tried was to add this to the pulpcore content handler.
/pulpcore/content/handler.py:

@staticmethod
def response_headers(path):
    content_type, encoding = mimetypes.guess_type(path)
    # new-code start
    if (not content_type) and (path.endswith("/Release") or path.endswith("/InRelease") or path.endswith("Packages")):
        content_type = "text/plain"
    # new-code end
    headers = {}
    if content_type:
        headers["Content-Type"] = content_type
    return headers

This works and the files are then served correctly, but it just feels wrong to do this in pulpcore.
Is there a way to add this into pulp_deb?

1 Like

I don’t think there is an easy way to add this to pulp_deb currently. We allow plugin writers to extend the content app through two methods:

  • Add new routes with custom handlers
  • Override distribution’s content handler methods

The second method could work, but it’s mainly used to help serve content that isn’t already stored as artifacts. I think we need to add a new API for plugins to pulpcore to allow plugin’s to update the headers based on the relative path.

1 Like

I guess the PublishedArtifact or the ContentArtifact would be the place to put that information into the database.

Honestly, I would like to avoid saving that information into the database, because for all file-types with a file-extension python’s mimetypes-library does a very good job of guessing the correct type.
Also it makes it harder to backport the Fix :stuck_out_tongue:

It’s only for the filenames without extensions, where it falls back to always say ‘binary’ (which is understandable from a *NIX point of view). But that is also very annoying when it clashes with per-definition plaintext files like Release and Packages.

Ideally, it would be derived statically from the Content class the artifact was accessed through.
But first, this is impossible for published metadata artifacts, and it is at the very least hard for multi-artifact content.

But what might work for you would be a content-type-hook in the distribution class that is called before the mimetype guessing, right?