Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add access monitoring rules to msteams plugins #47638

Merged
merged 9 commits into from
Oct 22, 2024

Conversation

EdwardDowling
Copy link
Contributor

@EdwardDowling EdwardDowling commented Oct 16, 2024

Part of #24495

Adds access monitoring rule based routing for the Microsoft teams plugin

changelog: Add access monitoring rule based routing for the Microsoft teams plugin

Comment on lines +240 to +258
process.SpawnCriticalJob(watcherJob)

ok, err := watcherJob.WaitReady(ctx)
if err != nil {
return trace.Wrap(err)
}
if len(acceptedWatchKinds) == 0 {
return trace.BadParameter("failed to initialize watcher for all the required resources: %+v",
watchKinds)
}
// Check if KindAccessMonitoringRule resources are being watched,
// the role the plugin is running as may not have access.
if slices.Contains(acceptedWatchKinds, types.KindAccessMonitoringRule) {
if err := a.accessMonitoringRules.InitAccessMonitoringRulesCache(ctx); err != nil {
return trace.Wrap(err, "initializing Access Monitoring Rule cache")
}
}
a.watcherJob = watcherJob
a.watcherJob.SetReady(ok)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code contains two races:

  • we start processing AR events while the AMR cache is not inited
  • we start processing AMR events while the AMR cache is not inited (cache should be inited when receiving the watcher INIT event)

This will lead to two issues:

  • ARs processed early after plugin startup are not sent to the correct recipients
  • AMRs created/updated early after plugin startup are ignored by the plugin

As those issues also exist in the common AMR plugin implementation we might decide to merge the PR anyway and fix those later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting queuing up incoming ARs and AMRs for processing until the init is done?

Copy link
Contributor

@hugoShaka hugoShaka Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the AR vs AMR race I'm not sure yet, I would tend to create the AMR watcher first, then based on the result, setup AR watching. I'm worried this approach could lead to out-of-order events but I'm not even sure Teleport protects against them in the first place with a dual-listener. I'll ask @tigrato and @espadolini if we have strong ordering guarantees in the auth event stream.

For the AMR init race, I would edit the watcherjob logic to make it run the cache init login on the INIT event. This is the proper way of initializing a cache in Teleport, but the watcherjob lib currently swallows the init event because it was not designed to maintain a local cache but only receive ARs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no protection against out of order events even with a single listener: watcher events are guaranteed to be delivered in order only with respects to each individual item's events, with no guarantee of ordering across items.

FWIW the integrations/lib/watcherjob.job seems to take full advantage of that, sequentially applying events for the same kind and name but processing events for different resources concurrently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah, any replica fed by a watcher should open the watcher, wait for the OpInit event which is guaranteed to be the first event yielded by the watcher, then fetch the current state, then apply events in order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So plan for now is merge this then create a sperate issue and PR for the fixes to the access monitoring rule cache?
@hugoShaka

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all other plugins already have broken watchers so one more won't be an issue.

Comment on lines +158 to +168
FetchRecipientCallback: func(ctx context.Context, name string) (*common.Recipient, error) {
msTeamsRecipient, err := a.bot.FetchRecipient(ctx, name)
if err != nil {
return nil, trace.Wrap(err)
}
return &common.Recipient{
Name: name,
ID: msTeamsRecipient.ID,
Kind: string(msTeamsRecipient.Kind),
}, nil
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin might be very slow with AMRs and we might want to increase the timeout to more than 5 seconds. Fetching recipients installs the app for every recipient, which can take several seconds and can fail.

We mitigated this by fetching every recipient on startup, this allowed to avoid timeouts and fail fast. A safer way to do this would be to tentatively fetch recipients when loading the AMR. We would still miss the fail fast, but at least we'd avoid the timeouts.

We can merge this as-is (with an increase timeout, I'd bump to at least 15 sec) and open an issue to improve the AMR logic later if that's still an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increased the timeout for now, would installing the app for all possible recipients when initializing the rule cache at the start and then again whenever a new one is found be what you mean for the potential fix?

Copy link
Contributor

@hugoShaka hugoShaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock the hosted support for MsTeams, but we'll need to come back and fix the AMR cache logic.

@EdwardDowling EdwardDowling added this pull request to the merge queue Oct 22, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 22, 2024
@EdwardDowling EdwardDowling added this pull request to the merge queue Oct 22, 2024
Merged via the queue into master with commit b993005 Oct 22, 2024
39 checks passed
@EdwardDowling EdwardDowling deleted the edwarddowling/msteams-amr branch October 22, 2024 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants