Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add user rbac and iap #1

Merged
merged 6 commits into from
Aug 11, 2023
Merged

add user rbac and iap #1

merged 6 commits into from
Aug 11, 2023

Conversation

jimmykodes
Copy link
Collaborator

@jimmykodes jimmykodes commented Jul 28, 2023

Summary

This adds the IAP and RBAC functionality to label studio's backend.

IAP Middleware

The IAP middleware is responsible for converting IAP's JWT into a valid user, then creating that user if it doesn't exist, and finally, logging the user in.
This login step ensures a session cookie is set for the user. Because the IAP middleware only runs when a request.user is not set, and the middleware triggers after the existing auth middleware, the session cookie will take precedence over the JWT, so we won't have to decode and validate the jwt on every request.

RBAC

RBAC is achieved through two means, the first being a new django model, UserRole. This UserRole is a One-to-One field to a user, and contains an integer value that corresponds to their given role(which is an enum).
The second is a Rest Framework Permissions Class which is used to evaluate whether the authenticated user has permissions to the given endpoint. This leverages the existing permissions values on the api routes (ie projects.view, tasks.create, etc.), and checks that the request's user's role has the required permission for the api route/method pair.

Adding a separate model was chosen so that the label studio's User model did not need to be updated. This will make merging in upstream changes easier, as well as keeping DB migrations fully separate. I've opted to keep everything in its own Django app (aptly named khan) so that our code is almost fully siloed. With few exceptions (and only for good reasons) have I touched any code outside this new app.

Additionally, new settings files were included, with a base file inherited by all, then one for local dev, one for test and another for prod. These can be set in the various places by defining the DJANGO_SETTINGS_MODULE and pointing it to khan.settings.(local|test|prod)

Issues: DI-605 DI-606 DI-607

Testing Plan

This was deployed into ml-infra-test to validate both the IAP and RBAC changes, see https://github.com/Khan/data-labeling/pull/14 for example python code used to test these changes for the headless interactions. Additionally, the ui permissions were verified by changing my user's permission level and clicking around the UI, ensuring permissions errors were raised where they should be, based on my user's role at any given moment.

@jimmykodes jimmykodes self-assigned this Jul 28, 2023
Comment on lines +33 to +38
if Organization.objects.exists():
org = Organization.objects.first()
org.add_user(user)
else:
org = Organization.create_organization(
created_by=user, title='Label Studio')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was pulled from the existing user signup logic

@jimmykodes jimmykodes marked this pull request as ready for review August 10, 2023 15:03
Copy link

@dat-boris dat-boris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks amazing! Good job on separating out the auth into a separate app - bravo! 🎉 🎉 🎉

(This PR has no reviewers assigned, but this looks too interesting not to jump in and check this out :) )

I do have some non-blocking questions about documentation.

  1. Maybe useful to create a simple README to explain what this app does.
  2. Maybe it is worthwhile to document the expected upstream merging process. Separating the code out is very nice - but it also comes at a cost of some copy and pasta from the base. So we should be aware of any particular copy and paste area (e.g. settings) and ensure that we should check those when we merge upstream.

certs_url=IAP_CERT_URL,
)
except ValueError as e:
logger.warn("invalid token: %s", e)
Copy link

@dat-boris dat-boris Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: we might not want to log the exception if it contains token content? Though I guess it is only JWT signed, so no secret will be leaked, but this might still contain PII.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digging through the code, these ValueErrors do not contain the contents of the token's themselves, so we should be good to keep logging the exceptions.


_roles: Dict[Role, Set[str]] = {}

_roles[Role.LABELER] = {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

priase: love this! Thanks for working out so many of the required permissions.

Also love how the set union is work for composing the permissions!

from core.settings.base import *

# Make sure our custom django app is installed
INSTALLED_APPS.extend([

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

django n00b wondering: No action required. I wonder if this is how we get the characteristic of that LS_TOKEN > iap-jwt authentication, since both the rest_framework and thelabel_manager (?) app would run first?

[1] https://github.com/Khan/label-studio/blob/develop/label_studio/core/settings/base.py#L178

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the order of installed apps doesn't really matter, what does matter is the order of MIDDLEWARE and REST_FRAMEWORK['DEFAULT_PERMISSION_CLASSES'] and REST_FRAMEWORK['DEFAULT_AUTHENTICATION_CLASSES'] as well as, in the internals of Django/DRF the order those two things call each other/deferentially treat the other. ie: does the drf internals respect the user added from JWT? or does it prefer it's own authentication if a token is provided. And, with at least anecdotal evidence, it seems that drf prefers the user from the auth token, when provided, but will fall back to the django user from the middleware.

IAP_AUDIENCE = get_env("IAP_AUDIENCE")

# Don't send telemetry data to tele.labelstud.io
COLLECT_ANALYTICS = False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Thank you for adding this!

@jimmykodes
Copy link
Collaborator Author

  1. Maybe useful to create a simple README to explain what this app does.
  2. Maybe it is worthwhile to document the expected upstream merging process. Separating the code out is very nice - but it also comes at a cost of some copy and pasta from the base. So we should be aware of any particular copy and paste area (e.g. settings) and ensure that we should check those when we merge upstream.

Good callouts here, I'll get some docs added inside the khan app

Copy link

@wwells wwells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! A few questions largely to help me grok as a django noob.

@@ -0,0 +1,23 @@
# Generated by Django 3.2.19 on 2023-07-26 18:29
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pardon my ignorance - what does this do and why should it be in version control?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a db migration, so it is responsible for creating the database tables/columns/etc. it is generated by manage.py makemigrations and applied using manage.py migrate the latter of which is done automagically when label-studio starts. (this is actually a label-studio-ism, not a django-ism, and I'm personally on the fence about it, but nobody asked me 😉 ). These are basically python DDL.

@jimmykodes jimmykodes merged commit 7ca0be4 into develop Aug 11, 2023
1 check passed
@jimmykodes jimmykodes deleted the khan_user_management branch August 11, 2023 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants