-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add user rbac and iap #1
Conversation
if Organization.objects.exists(): | ||
org = Organization.objects.first() | ||
org.add_user(user) | ||
else: | ||
org = Organization.create_organization( | ||
created_by=user, title='Label Studio') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was pulled from the existing user signup logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks amazing! Good job on separating out the auth into a separate app - bravo! 🎉 🎉 🎉
(This PR has no reviewers assigned, but this looks too interesting not to jump in and check this out :) )
I do have some non-blocking questions about documentation.
- Maybe useful to create a simple README to explain what this app does.
- Maybe it is worthwhile to document the expected upstream merging process. Separating the code out is very nice - but it also comes at a cost of some copy and pasta from the base. So we should be aware of any particular copy and paste area (e.g.
settings
) and ensure that we should check those when we merge upstream.
certs_url=IAP_CERT_URL, | ||
) | ||
except ValueError as e: | ||
logger.warn("invalid token: %s", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: we might not want to log the exception if it contains token content? Though I guess it is only JWT signed, so no secret will be leaked, but this might still contain PII.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
digging through the code, these ValueErrors
do not contain the contents of the token's themselves, so we should be good to keep logging the exceptions.
|
||
_roles: Dict[Role, Set[str]] = {} | ||
|
||
_roles[Role.LABELER] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
priase: love this! Thanks for working out so many of the required permissions.
Also love how the set union is work for composing the permissions!
from core.settings.base import * | ||
|
||
# Make sure our custom django app is installed | ||
INSTALLED_APPS.extend([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
django n00b wondering: No action required. I wonder if this is how we get the characteristic of that LS_TOKEN > iap-jwt
authentication, since both the rest_framework
and thelabel_manager
(?) app would run first?
[1] https://github.com/Khan/label-studio/blob/develop/label_studio/core/settings/base.py#L178
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, the order of installed apps doesn't really matter, what does matter is the order of MIDDLEWARE
and REST_FRAMEWORK['DEFAULT_PERMISSION_CLASSES']
and REST_FRAMEWORK['DEFAULT_AUTHENTICATION_CLASSES']
as well as, in the internals of Django/DRF the order those two things call each other/deferentially treat the other. ie: does the drf internals respect the user added from JWT? or does it prefer it's own authentication if a token is provided. And, with at least anecdotal evidence, it seems that drf prefers the user from the auth token, when provided, but will fall back to the django user from the middleware.
IAP_AUDIENCE = get_env("IAP_AUDIENCE") | ||
|
||
# Don't send telemetry data to tele.labelstud.io | ||
COLLECT_ANALYTICS = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Thank you for adding this!
Good callouts here, I'll get some docs added inside the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! A few questions largely to help me grok as a django noob.
@@ -0,0 +1,23 @@ | |||
# Generated by Django 3.2.19 on 2023-07-26 18:29 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pardon my ignorance - what does this do and why should it be in version control?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a db migration, so it is responsible for creating the database tables/columns/etc. it is generated by manage.py makemigrations
and applied using manage.py migrate
the latter of which is done automagically when label-studio starts. (this is actually a label-studio-ism, not a django-ism, and I'm personally on the fence about it, but nobody asked me 😉 ). These are basically python DDL.
Summary
This adds the IAP and RBAC functionality to label studio's backend.
IAP Middleware
The IAP middleware is responsible for converting IAP's JWT into a valid user, then creating that user if it doesn't exist, and finally, logging the user in.
This login step ensures a session cookie is set for the user. Because the IAP middleware only runs when a request.user is not set, and the middleware triggers after the existing auth middleware, the session cookie will take precedence over the JWT, so we won't have to decode and validate the jwt on every request.
RBAC
RBAC is achieved through two means, the first being a new django model, UserRole. This UserRole is a One-to-One field to a user, and contains an integer value that corresponds to their given role(which is an enum).
The second is a Rest Framework Permissions Class which is used to evaluate whether the authenticated user has permissions to the given endpoint. This leverages the existing permissions values on the api routes (ie
projects.view
,tasks.create
, etc.), and checks that the request's user's role has the required permission for the api route/method pair.Adding a separate model was chosen so that the label studio's User model did not need to be updated. This will make merging in upstream changes easier, as well as keeping DB migrations fully separate. I've opted to keep everything in its own Django app (aptly named
khan
) so that our code is almost fully siloed. With few exceptions (and only for good reasons) have I touched any code outside this new app.Additionally, new settings files were included, with a
base
file inherited by all, then one forlocal
dev, one fortest
and another forprod
. These can be set in the various places by defining theDJANGO_SETTINGS_MODULE
and pointing it tokhan.settings.(local|test|prod)
Issues: DI-605 DI-606 DI-607
Testing Plan
This was deployed into ml-infra-test to validate both the IAP and RBAC changes, see https://github.com/Khan/data-labeling/pull/14 for example python code used to test these changes for the headless interactions. Additionally, the ui permissions were verified by changing my user's permission level and clicking around the UI, ensuring permissions errors were raised where they should be, based on my user's role at any given moment.