-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add access monitoring rules to msteams plugins #47638
Conversation
process.SpawnCriticalJob(watcherJob) | ||
|
||
ok, err := watcherJob.WaitReady(ctx) | ||
if err != nil { | ||
return trace.Wrap(err) | ||
} | ||
if len(acceptedWatchKinds) == 0 { | ||
return trace.BadParameter("failed to initialize watcher for all the required resources: %+v", | ||
watchKinds) | ||
} | ||
// Check if KindAccessMonitoringRule resources are being watched, | ||
// the role the plugin is running as may not have access. | ||
if slices.Contains(acceptedWatchKinds, types.KindAccessMonitoringRule) { | ||
if err := a.accessMonitoringRules.InitAccessMonitoringRulesCache(ctx); err != nil { | ||
return trace.Wrap(err, "initializing Access Monitoring Rule cache") | ||
} | ||
} | ||
a.watcherJob = watcherJob | ||
a.watcherJob.SetReady(ok) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code contains two races:
- we start processing AR events while the AMR cache is not inited
- we start processing AMR events while the AMR cache is not inited (cache should be inited when receiving the watcher INIT event)
This will lead to two issues:
- ARs processed early after plugin startup are not sent to the correct recipients
- AMRs created/updated early after plugin startup are ignored by the plugin
As those issues also exist in the common AMR plugin implementation we might decide to merge the PR anyway and fix those later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting queuing up incoming ARs and AMRs for processing until the init is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the AR vs AMR race I'm not sure yet, I would tend to create the AMR watcher first, then based on the result, setup AR watching. I'm worried this approach could lead to out-of-order events but I'm not even sure Teleport protects against them in the first place with a dual-listener. I'll ask @tigrato and @espadolini if we have strong ordering guarantees in the auth event stream.
For the AMR init race, I would edit the watcherjob
logic to make it run the cache init login on the INIT
event. This is the proper way of initializing a cache in Teleport, but the watcherjob lib currently swallows the init event because it was not designed to maintain a local cache but only receive ARs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no protection against out of order events even with a single listener: watcher events are guaranteed to be delivered in order only with respects to each individual item's events, with no guarantee of ordering across items.
FWIW the integrations/lib/watcherjob.job
seems to take full advantage of that, sequentially applying events for the same kind and name but processing events for different resources concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yeah, any replica fed by a watcher should open the watcher, wait for the OpInit
event which is guaranteed to be the first event yielded by the watcher, then fetch the current state, then apply events in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So plan for now is merge this then create a sperate issue and PR for the fixes to the access monitoring rule cache?
@hugoShaka
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all other plugins already have broken watchers so one more won't be an issue.
FetchRecipientCallback: func(ctx context.Context, name string) (*common.Recipient, error) { | ||
msTeamsRecipient, err := a.bot.FetchRecipient(ctx, name) | ||
if err != nil { | ||
return nil, trace.Wrap(err) | ||
} | ||
return &common.Recipient{ | ||
Name: name, | ||
ID: msTeamsRecipient.ID, | ||
Kind: string(msTeamsRecipient.Kind), | ||
}, nil | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This plugin might be very slow with AMRs and we might want to increase the timeout to more than 5 seconds. Fetching recipients installs the app for every recipient, which can take several seconds and can fail.
We mitigated this by fetching every recipient on startup, this allowed to avoid timeouts and fail fast. A safer way to do this would be to tentatively fetch recipients when loading the AMR. We would still miss the fail fast, but at least we'd avoid the timeouts.
We can merge this as-is (with an increase timeout, I'd bump to at least 15 sec) and open an issue to improve the AMR logic later if that's still an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increased the timeout for now, would installing the app for all possible recipients when initializing the rule cache at the start and then again whenever a new one is found be what you mean for the potential fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock the hosted support for MsTeams, but we'll need to come back and fix the AMR cache logic.
Part of #24495
Adds access monitoring rule based routing for the Microsoft teams plugin
changelog: Add access monitoring rule based routing for the Microsoft teams plugin