Multi tennancy and the question of owning content

The current plan for our RBAC implementation in pulp involves assigning roles on objects to users, so that the creator of e.g. a repository will eventually be the owner of it.
But this concept is not extended to content. And that way, strange behavior emerges. If you say, only content in a repository you can access is visible to you, then you will be unable to see (and use) the very content you just uploaded without adding it to a repository in the very same rest call. If we do not impose any restriction on viewing existing content at all, that is definitely not secure. Should we add permissions and access policies to the content in much the same way as for other objects, you may try to upload a content that already exists, yet is invisible to you (I believe you should be added the corresponding roles in that case). When i sync, content into a repository, should it inherit all the roles from it (this potentially busts the UserRoles table)? Should we add a (hidden) repository per user to track their “owned” content (busts another table, but the information must go somewhere, no?)?

Good thing: For content being immutable it’s no problem to have multiple mutually untrusted entities registered as “owner”.

What i’d expect from the final design:

  • If i cannot see a specific content, i can upload it, and handle it in exactly the same way (including a “201 created” response by the server) as if it was new to Pulp.
  • Optional: As long as i can see content, it is protected from orphan cleanup.
  • Optional: I can trash content (resign the viewing permission).
  • I can grant viewing permission of content i see to other users.
  • I can filter content, i can see by direct/indirect permission.
    • This obviously does not apply if i permissions for synced content are explicitly granted.
1 Like

pulp_python has a very specific need for an answer to this problem. The Python index has two roles: Owner and Maintainer which can be applied to each Python package. Since package names are unique there can only be one copy of a specific release of a package in each index (repository). So if Alice and Bob both create their own special builds of numpy only the person who uploads their version first would get the Owner role and be able to continue uploading new releases. The solution to this problem is to add sha256 to global uniqueness constraint, only having filename uniqueness locally on repository level (which has already been done in pulp_python) and to also have uploaded content only get their permissions on a repository level. Even in the case where users upload the same package to their individual indices, it would still be beneficial for each repository to get their own roles, even if the content is technically shared (de-duped) behind the scenes.

So, my ultimate point is that content permissions will need to be done at a repository level.

Some other thoughts I had on the issue:

  • Change the generic upload APIs to only work for admin unless the user specifies a repository they own.
  • Generic content view APIs should use scoping based on the repositories the user has permissions for (this could get ugly on the query side, but security wise I think it makes sense), admins behavior would be unchanged as it is now.
  • Assiging roles while syncing content should be up to the plugin-writer (maybe as part of the first stage), but it should still follow my ultimate point of content permissions being assigned at repository level.
  • As for deletion (whenever we add this as users really want something), only admins should truly be able remove something from Pulp. If you have content permissions you should only be able to remove that content from the repository your permissions are derived from.

Maybe I’m missing the point of you’re writing, but +1 to the idea that permissions govern who can control content in repositories. I don’t think that’s the same thing as determining who gets to decide what binary data back a specific piece of content. If I’m missing the point here, please share more.

I’ve been considering two not fully thought through ideas:

  1. What would happen if content have read/update/delete permissions for users/groups? I think it would cause outcomes like:
  • orphan cleanup only removes orphans this user has delete permissions on
  • When viewing content, queryset scoping is in effect
  1. While ^ is helpful, at least one serious problem remains. When a user can’t see Content foo, and brings in new content foo with the same binary data what happens? What about when the binary data is different? I think we need to handle both cases. What we might do is namespace Content per tenant, permission, or something like that</hands wave>. The idea is that Content foo from user A and Content foo from user B need to have their own definitions. This is important because user A shouldn’t be able to delete content that is also “owned” by user B for example. Similarly, in cases where the binary data is different you 100% will need two definitions of content foo in the DB. So uniqueness key maybe gets extended by “tenant ID”?

What if you have a large file, and you need to use the upload api to create an artifact first? Do you own that artifact (you cannot link it to a repository you own, before it is used by content)? What if the artifact existed before? Do you start to own it too? I think the upload commit call should return the existing artifact as if it was just created. I see a lot of room for unexpected behaviour here, and with this post I wanted to start collect expectations.

my first reaction is that we should not let content inherit all the roles from the repo sync because sync of the repo can be triggered by different users

Whenever an artifact or content is being uploaded/created by the user, the he should own it, otherwise if the content is being brought into pulp by sync or copy operation between repos maybe we should then follow the governance of having content permissions at repo level.

When viewing content he will see his uploaded content + content where he has repo read perms on.

When triggering orphan clean up we’ll be able to clean up only his uploaded content( uniqueness key possibly extended with tenant ID"?). Content that was brought into pulp with sync which became orphan and protection_time has expired can be removed from pulp only by admin (Debatable, but actually I think it might be ok to let users have removed that content as well, since once the protection_time is expired we give no guarantees is will stay in pulp?..)

Have not thought of all the implication of this idea, but just throwing it for discussion.

OK, maybe this is the thing that bothers me: Having two different ways to receive permissions on content seems dangerously complex.

yeah, that’s def a valid concern I share too.