Skip to content

Commit

Permalink
Fix opsgenie plugin to use correct annotation to determine whether or…
Browse files Browse the repository at this point in the history
… not to create alerts (#38435)

* Fix bug in opsgenie plugin mixing up notify-services and schedules

* Update docs for opsgenie plugin to use correct schedules

* Remove accidental change to opsgenie client

* Update opsgenie client tests

* Update opsgneie plugin behaviour for determining recipients

* Update docs with correct screenshot of role creation

* Stop Opsgenie plugin sending alerts when /schedules label is set

* Change rawRecipients initialisation in opsgenie bot

* Remove opsgenie add role example image from docs

* Update integrations/access/accessrequest/app.go

Co-authored-by: Roman Tkachenko <[email protected]>

* Update docs/pages/access-controls/access-request-plugins/opsgenie.mdx

Co-authored-by: Roman Tkachenko <[email protected]>

* Make variable Opsgenie plugin var names more consistent

* Update changelog with opsgenie breaking changes

* Rename ReqAnnotationSchedulesLabel for consistency

* Fix opsgenie getOnCall

* Add opsgenie approval tests + add approval user configuration

* Appease linter

* Add back in opsgenie client fix

* Appease linter

* Add in clarifying comment for alert request retry behaviour

* Add missing opsgenie endpoint mocks

---------

Co-authored-by: Roman Tkachenko <[email protected]>
Co-authored-by: Hugo Hervieux <[email protected]>
  • Loading branch information
3 people authored Mar 27, 2024
1 parent 4b4fa81 commit 0b01cf5
Show file tree
Hide file tree
Showing 16 changed files with 372 additions and 76 deletions.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Changelog

## 16.0.0 (xx/xx/xx)

### Breaking changes

#### Opsgenie plugin annotations

Opsgenie plugin users, role annotations must now contain
`teleport.dev/notify-services` to receive notification on Opsgenie.
`teleport.dev/schedules` is now the label used to determine auto approval flow.
See [the Opsgenie plugin documentation](docs/pages/access-controls/access-request-plugins/opsgenie.mdx)
for setup instructions.

## 15.0.0 (xx/xx/24)

### New features
Expand Down
8 changes: 4 additions & 4 deletions api/types/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -727,10 +727,10 @@ const (
// DiscoveryAppIgnore specifies if a Kubernetes service should be ignored by discovery service.
DiscoveryAppIgnore = TeleportNamespace + "/ignore"

// ReqAnnotationSchedulesLabel is the request annotation key at which schedules are stored for access plugins.
ReqAnnotationSchedulesLabel = "/schedules"
// ReqAnnotationNotifyServicesLabel is the request annotation key at which notify services are stored for access plugins.
ReqAnnotationNotifyServicesLabel = "/notify-services"
// ReqAnnotationApproveSchedulesLabel is the request annotation key at which schedules are stored for access plugins.
ReqAnnotationApproveSchedulesLabel = "/schedules"
// ReqAnnotationNotifySchedulesLabel is the request annotation key at which notify schedules are stored for access plugins.
ReqAnnotationNotifySchedulesLabel = "/notify-services"

// CloudAWS identifies that a resource was discovered in AWS.
CloudAWS = "AWS"
Expand Down
Binary file not shown.
11 changes: 6 additions & 5 deletions docs/pages/access-controls/access-request-plugins/opsgenie.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,6 @@ To create a user first navigate to Management -> Access -> Roles

Then select 'Create New Role' and create the requester role.

![Add user one](../../../img/enterprise/plugins/opsgenie/add-requester-role.png)

```
kind: role
version: v5
Expand All @@ -75,10 +73,13 @@ spec:
- approve: 1
deny: 1
annotations:
teleport.dev/schedules: ['teleport-access-request-notifications']
teleport.dev/notify-services: ['teleport-access-request-notifications']
teleport.dev/schedules: ['teleport-access-alert-schedules']
```

The `teleport.dev/schedules` annotation specifies the schedule the alert will be be created for.
The `teleport.dev/notify-services` annotation specifies the schedules the alert will be created for.
The `teleport.dev/schedules` annotation specifies the schedules the alert will check, and auto approve the
Access Request if the requesting user is on-call.

### Create a user who will request access

Expand Down Expand Up @@ -121,7 +122,7 @@ As the Teleport user `myuser`, create an Access Request for the `editor` role:

In Opsgenie, you will see a new alert containing information about the
Access Request in either the default schedule specified when enrolling the plugin,
or in the schedules specified by `teleport.dev/schedules` annotation in the requester's role.
or in the schedules specified by `teleport.dev/notify-services` annotation in the requester's role.

### Resolve the request

Expand Down
19 changes: 11 additions & 8 deletions integrations/access/accessrequest/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -352,16 +352,19 @@ func (a *App) getMessageRecipients(ctx context.Context, req types.AccessRequest)
recipientSet.Add(common.Recipient{})
return recipientSet.ToSlice()
case types.PluginTypeOpsgenie:
if recipients, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationSchedulesLabel]; ok {
for _, recipient := range recipients {
rec, err := a.bot.FetchRecipient(ctx, recipient)
if err != nil {
log.Warning(err)
}
recipientSet.Add(*rec)
}
recipients, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationNotifySchedulesLabel]
if !ok {
return recipientSet.ToSlice()
}
for _, recipient := range recipients {
rec, err := a.bot.FetchRecipient(ctx, recipient)
if err != nil {
log.Warningf("Failed to fetch Opsgenie recipient: %v", err)
continue
}
recipientSet.Add(*rec)
}
return recipientSet.ToSlice()
}

validEmailSuggReviewers := []string{}
Expand Down
38 changes: 24 additions & 14 deletions integrations/access/opsgenie/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ import (
tp "github.com/gravitational/teleport"
"github.com/gravitational/teleport/api/client/proto"
"github.com/gravitational/teleport/api/types"
"github.com/gravitational/teleport/integrations/access/common"
"github.com/gravitational/teleport/integrations/access/common/teleport"
"github.com/gravitational/teleport/integrations/lib"
"github.com/gravitational/teleport/integrations/lib/backoff"
Expand All @@ -45,9 +44,9 @@ const (
// minServerVersion is the minimal teleport version the plugin supports.
minServerVersion = "6.1.0"
// initTimeout is used to bound execution time of health check and teleport version check.
initTimeout = time.Second * 10
initTimeout = time.Second * 30
// handlerTimeout is used to bound the execution time of watcher event handler.
handlerTimeout = time.Second * 5
handlerTimeout = time.Second * 30
// modifyPluginDataBackoffBase is an initial (minimum) backoff value.
modifyPluginDataBackoffBase = time.Millisecond
// modifyPluginDataBackoffMax is a backoff threshold
Expand Down Expand Up @@ -141,10 +140,9 @@ func (a *App) init(ctx context.Context) error {
defer cancel()

var err error
if a.teleport == nil {
if a.teleport, err = common.GetTeleportClient(ctx, a.conf.Teleport); err != nil {
return trace.Wrap(err)
}
a.teleport, err = a.conf.GetTeleportClient(ctx)
if err != nil {
return trace.Wrap(err, "getting teleport client")
}

if _, err = a.checkTeleportVersion(ctx); err != nil {
Expand All @@ -155,6 +153,13 @@ func (a *App) init(ctx context.Context) error {
if err != nil {
return trace.Wrap(err)
}

log := logger.Get(ctx)
log.Debug("Starting API health check...")
if err = a.opsgenie.CheckHealth(ctx); err != nil {
return trace.Wrap(err, "API health check failed")
}
log.Debug("API health check finished ok")
return nil
}

Expand Down Expand Up @@ -269,15 +274,15 @@ func (a *App) onDeletedRequest(ctx context.Context, reqID string) error {
}

func (a *App) getNotifyServiceNames(req types.AccessRequest) ([]string, error) {
services, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationNotifyServicesLabel]
services, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationNotifySchedulesLabel]
if !ok {
return nil, trace.NotFound("notify services not specified")
}
return services, nil
}

func (a *App) getOnCallServiceNames(req types.AccessRequest) ([]string, error) {
services, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationSchedulesLabel]
services, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationApproveSchedulesLabel]
if !ok {
return nil, trace.NotFound("on-call schedules not specified")
}
Expand All @@ -294,11 +299,16 @@ func (a *App) tryNotifyService(ctx context.Context, req types.AccessRequest) (bo
}

reqID := req.GetName()
annotations := types.Labels{}
for k, v := range req.GetSystemAnnotations() {
annotations[k] = v
}
reqData := RequestData{
User: req.GetUser(),
Roles: req.GetRoles(),
Created: req.GetCreationTime(),
RequestReason: req.GetRequestReason(),
User: req.GetUser(),
Roles: req.GetRoles(),
Created: req.GetCreationTime(),
RequestReason: req.GetRequestReason(),
SystemAnnotations: annotations,
}

// Create plugin data if it didn't exist before.
Expand Down Expand Up @@ -429,7 +439,7 @@ func (a *App) tryApproveRequest(ctx context.Context, req types.AccessRequest) er
if _, err := a.teleport.SubmitAccessReview(ctx, types.AccessReviewSubmission{
RequestID: req.GetName(),
Review: types.AccessReview{
Author: tp.SystemAccessApproverUserName,
Author: a.conf.TeleportUserName,
ProposedState: types.RequestState_APPROVED,
Reason: fmt.Sprintf("Access requested by user %s who is on call on service(s) %s",
tp.SystemAccessApproverUserName,
Expand Down
19 changes: 12 additions & 7 deletions integrations/access/opsgenie/bot.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,17 @@ func (b Bot) SendReviewReminders(ctx context.Context, recipients []common.Recipi
}

// BroadcastAccessRequestMessage creates an alert for the provided recipients (schedules)
func (b *Bot) BroadcastAccessRequestMessage(ctx context.Context, recipients []common.Recipient, reqID string, reqData pd.AccessRequestData) (data accessrequest.SentMessages, err error) {
schedules := []string{}
for _, recipient := range recipients {
schedules = append(schedules, recipient.Name)
func (b *Bot) BroadcastAccessRequestMessage(ctx context.Context, recipientSchedules []common.Recipient, reqID string, reqData pd.AccessRequestData) (data accessrequest.SentMessages, err error) {
notificationSchedules := make([]string, 0, len(recipientSchedules))
for _, notifySchedule := range recipientSchedules {
notificationSchedules = append(notificationSchedules, notifySchedule.Name)
}
if len(recipients) == 0 {
schedules = append(schedules, b.client.DefaultSchedules...)
autoApprovalSchedules := []string{}
if annotationAutoApprovalSchedules, ok := reqData.SystemAnnotations[types.TeleportNamespace+types.ReqAnnotationApproveSchedulesLabel]; ok {
autoApprovalSchedules = annotationAutoApprovalSchedules
}
if len(autoApprovalSchedules) == 0 {
autoApprovalSchedules = append(autoApprovalSchedules, b.client.DefaultSchedules...)
}
opsgenieReqData := RequestData{
User: reqData.User,
Expand All @@ -79,7 +83,8 @@ func (b *Bot) BroadcastAccessRequestMessage(ctx context.Context, recipients []co
Reason: reqData.ResolutionReason,
},
SystemAnnotations: types.Labels{
types.TeleportNamespace + types.ReqAnnotationSchedulesLabel: schedules,
types.TeleportNamespace + types.ReqAnnotationApproveSchedulesLabel: autoApprovalSchedules,
types.TeleportNamespace + types.ReqAnnotationNotifySchedulesLabel: notificationSchedules,
},
}
opsgenieData, err := b.client.CreateAlert(ctx, reqID, opsgenieReqData)
Expand Down
66 changes: 57 additions & 9 deletions integrations/access/opsgenie/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,24 @@ import (
"github.com/aws/aws-sdk-go/aws/defaults"
"github.com/go-resty/resty/v2"
"github.com/gravitational/trace"
"github.com/jonboulle/clockwork"

"github.com/gravitational/teleport/api/types"
"github.com/gravitational/teleport/integrations/access/common"
"github.com/gravitational/teleport/integrations/lib"
"github.com/gravitational/teleport/integrations/lib/backoff"
"github.com/gravitational/teleport/integrations/lib/logger"
)

const (
// alertKeyPrefix is the prefix for Alert's alias field used when creating an Alert.
alertKeyPrefix = "teleport-access-request"
heartbeatName = "teleport-access-heartbeat"
alertKeyPrefix = "teleport-access-request"
heartbeatName = "teleport-access-heartbeat"
ResponderTypeSchedule = "schedule"
ResponderTypeUser = "user"

ResolveAlertRequestRetryInterval = time.Second * 10
ResolveAlertRequestRetryTimeout = time.Minute * 2
)

var alertBodyTemplate = template.Must(template.New("alert body").Parse(
Expand Down Expand Up @@ -135,11 +142,11 @@ func (og Client) CreateAlert(ctx context.Context, reqID string, reqData RequestD
Message: fmt.Sprintf("Access request from %s", reqData.User),
Alias: fmt.Sprintf("%s/%s", alertKeyPrefix, reqID),
Description: bodyDetails,
Responders: og.getResponders(reqData),
Responders: og.getScheduleResponders(reqData),
Priority: og.Priority,
}

var result AlertResult
var result CreateAlertResult
resp, err := og.client.NewRequest().
SetContext(ctx).
SetBody(body).
Expand All @@ -153,20 +160,60 @@ func (og Client) CreateAlert(ctx context.Context, reqID string, reqData RequestD
if resp.IsError() {
return OpsgenieData{}, errWrapper(resp.StatusCode(), string(resp.Body()))
}

// If this fails, Teleport request approval and auto-approval will still work,
// but incident in Opsgenie won't be auto-closed or updated as the alertID won't be available.
alertRequestResult, err := og.tryGetAlertRequestResult(ctx, result.RequestID)
if err != nil {
return OpsgenieData{}, trace.Wrap(err)
}

return OpsgenieData{
AlertID: result.Alert.ID,
AlertID: alertRequestResult.Data.AlertID,
}, nil
}

func (og Client) getResponders(reqData RequestData) []Responder {
func (og Client) tryGetAlertRequestResult(ctx context.Context, reqID string) (GetAlertRequestResult, error) {
backoff := backoff.NewDecorr(ResolveAlertRequestRetryInterval, ResolveAlertRequestRetryTimeout, clockwork.NewRealClock())
for {
alertRequestResult, err := og.getAlertRequestResult(ctx, reqID)
if err == nil {
logger.Get(ctx).Debugf("Got alert request result: %+v", alertRequestResult)
return alertRequestResult, nil
}
logger.Get(ctx).Debug("Failed to get alert request result:", err)
if err := backoff.Do(ctx); err != nil {
return GetAlertRequestResult{}, trace.Wrap(err)
}
}
}

func (og Client) getAlertRequestResult(ctx context.Context, reqID string) (GetAlertRequestResult, error) {
var result GetAlertRequestResult
resp, err := og.client.NewRequest().
SetContext(ctx).
SetResult(&result).
SetPathParams(map[string]string{"requestID": reqID}).
Get("v2/alerts/requests/{requestID}")
if err != nil {
return GetAlertRequestResult{}, trace.Wrap(err)
}
defer resp.RawResponse.Body.Close()
if resp.IsError() {
return GetAlertRequestResult{}, errWrapper(resp.StatusCode(), string(resp.Body()))
}
return result, nil
}

func (og Client) getScheduleResponders(reqData RequestData) []Responder {
schedules := og.DefaultSchedules
if reqSchedules, ok := reqData.SystemAnnotations[types.TeleportNamespace+types.ReqAnnotationSchedulesLabel]; ok {
if reqSchedules, ok := reqData.SystemAnnotations[types.TeleportNamespace+types.ReqAnnotationNotifySchedulesLabel]; ok {
schedules = reqSchedules
}
responders := make([]Responder, 0, len(schedules))
for _, s := range schedules {
responders = append(responders, Responder{
Type: "schedule",
Type: ResponderTypeSchedule,
ID: s,
})
}
Expand Down Expand Up @@ -231,12 +278,13 @@ func (og Client) GetOnCall(ctx context.Context, scheduleName string) (Responders
SetContext(ctx).
SetPathParams(map[string]string{"scheduleName": scheduleName}).
SetQueryParams(map[string]string{
// This is required to lookup schedules by name (as opposed to lookup by ID)
"scheduleIdentifierType": "name",
// When flat is enabled it returns the email addresses of on-call participants.
"flat": "true",
}).
SetResult(&result).
Post("v2/schedules/{scheduleName}/on-calls")
Get("v2/schedules/{scheduleName}/on-calls")
if err != nil {
return RespondersResult{}, trace.Wrap(err)
}
Expand Down
5 changes: 4 additions & 1 deletion integrations/access/opsgenie/client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ import (
func TestCreateAlert(t *testing.T) {
recievedReq := ""
testServer := httptest.NewServer(http.HandlerFunc(func(res http.ResponseWriter, req *http.Request) {
if req.URL.Path != "/v2/alerts" {
return
}
bodyBytes, err := io.ReadAll(req.Body)
if err != nil {
log.Fatal(err)
Expand All @@ -56,7 +59,7 @@ func TestCreateAlert(t *testing.T) {
Roles: []string{"role1", "role2"},
RequestReason: "someReason",
SystemAnnotations: types.Labels{
types.TeleportNamespace + types.ReqAnnotationSchedulesLabel: {"[email protected]"},
types.TeleportNamespace + types.ReqAnnotationNotifySchedulesLabel: {"[email protected]"},
},
})
assert.NoError(t, err)
Expand Down
Loading

0 comments on commit 0b01cf5

Please sign in to comment.