Skip to content

Commit

Permalink
Slack notification grouping (#1371)
Browse files Browse the repository at this point in the history
* Slack notification grouping and summarization

* post-review fixes

* change the way grouping_summary_mode and grouping_enabled are initialized in SinkBase

* more post-review fixes

* more readable formatting of the interval in group summary messages

* more post-review fixes

* don't create Slack threads unless in summary mode; summary msg handling simplifications

* doc fixes, make ignore_first mandatory, change default interval

* fix time window tracking; oop refactor

* improve dependency specs
  • Loading branch information
Robert Szefler authored May 4, 2024
1 parent 2aa1b2a commit 9665d9d
Show file tree
Hide file tree
Showing 31 changed files with 622 additions and 52 deletions.
80 changes: 79 additions & 1 deletion docs/configuration/sinks/slack.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Example:
.. code-block:: yaml
sinks_config:
# slack integration params
# slack integration params, like slack_channel, api_key etc
- slack_sink:
name: main_slack_sink
api_key: xoxb-112...
Expand Down Expand Up @@ -115,6 +115,84 @@ If you'd like to automatically tag users on builtin alerts, please
`let us know <https://github.com/robusta-dev/robusta/issues/new?assignees=&labels=&template=feature_request.md&title=Tag%20Slack%20Users>`_.
We want to hear requirements.


Grouping and summarizing messages
-------------------------------------------------------------------

Some large systems that are being monitored by Robusta could generate
considerable amounts of notifications that are quite similar to each other
(for example, concern one type of a problem occurring over some part of
the cluster). For such cases, Robusta provides a mechanism that will reduce
the amount of clutter in Slack channels by grouping notifications based
on their properties and possibly summarizing the numbers of their
occurrences.

The mechanism is enabled by the ``grouping`` section in the Slack sink
config. The parameters you can group on (specified in the ``group_by``
section) are basically any values in the k8s event payload, with one
special addition - ``workload`` that will hold the name of the top-level
entity name for the event. Labels and annotations are supported as
described in the example below. If you specify ``grouping`` without
defining a ``group_by``, the default will be to group by the cluster
name.

The grouping mechanism supports the ``interval`` setting, which defines
the length of the window over which notifications will be aggregated.
The window starts when the first message belonging to the group arrives,
and ends when the specified interval elapses. If you don't specify the
``interval``, the default value will be 15 minutes.

There are two modes for this functionality, selected by the
``notification_mode`` subsection. For the ``regular`` mode, you have to
specify the ``ignore_first`` value. This value will determine the
minimum amount of notifications in any group that would have to occur
in the time specified by ``interval`` before they are sent as Slack
messages. This mode works like a false positive filter - it only triggers
the Slack sink if notifications are incoming at a speed above the set
threshold.

The ``summary`` mode allows summarizing the number of notifications in a
succinct way. The summary will be sent to Slack as a single message and will
include information about the number of all the messages in the group.
The summarization will be formatted as a table and done according
to the attributes listed under ``summary.by``. In case ``summary.threaded``
is ``true``, all the Slack notifications belonging to this group will be
appended in a thread under this header message. If ``summary.threaded`` is
``false``, the notifications will not be sent to Slack at all, and only the
summary message will appear.

The information in the summary message will be dynamically updated with
numbers of notifications in the group as they are incoming, regardless
of whether ``summary.threaded`` is enabled or not.

.. code-block::
sinksConfig:
- slack_sink:
# slack integration params, like slack_channel, api_key etc
grouping:
group_by:
- workload
- labels:
- app
- annotations:
- experimental_deployment
interval: 1800 # group time window, seconds
notification_mode:
summary:
threaded: true
by:
- identifier
- severity
.. note::

In the current, initial implementation of this mechanism, the
statistics of notifications are held in memory and not persisted
anywhere, so when the Robusta runner dies/restarts, they are lost
and the counting starts anew.


Creating Custom Slack Apps
-------------------------------------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,11 @@ watchgod = "^0.7"
webexteamssdk = "^1.6.1"
bitmath = "^1.3.3.1"
croniter = "^1.3.15"
humanize = "^3.13.1"

# The following are added to speed up poetry dependency resolution
botocore = "1.31.72"
boto3 = "1.28.72"

# we're freezing a specific version here because the latest version doesn't have prebuilt wheels on pypi
# and therefore requires gcc to install which we'd like to avoid
Expand Down Expand Up @@ -88,7 +93,6 @@ sphinxcontrib-images = "^0.9.4"
jsonref = "^0.2"
Pillow = "^10.3.0"
sphinxcontrib-mermaid = "^0.7.1"
humanize = "^3.13.1"
cssselect = "^1.1.0"
pygal = "^3.0.0"
tinycss = "^0.4"
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/datadog/datadog_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
class DataDogSinkParams(SinkBaseParams):
api_key: str

@classmethod
def _get_sink_type(cls):
return "datadog"


class DataDogSinkConfigWrapper(SinkConfigBase):
datadog_sink: DataDogSinkParams
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/discord/discord_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
class DiscordSinkParams(SinkBaseParams):
url: str

@classmethod
def _get_sink_type(cls):
return "discord"


class DiscordSinkConfigWrapper(SinkConfigBase):
discord_sink: DiscordSinkParams
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/file/file_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@
class FileSinkParms(SinkBaseParams):
file_name: str = None

@classmethod
def _get_sink_type(cls):
return "file"


class FileSinkConfigWrapper(SinkConfigBase):
file_sink: FileSinkParms
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/google_chat/google_chat_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@
class GoogleChatSinkParams(SinkBaseParams):
webhook_url: SecretStr

@classmethod
def _get_sink_type(cls):
return "google_chat"


class GoogleChatSinkConfigWrapper(SinkConfigBase):
google_chat_sink: GoogleChatSinkParams
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/jira/jira_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ class JiraSinkParams(SinkBaseParams):
reopenStatusName: Optional[str] = "To Do"
noReopenResolution: Optional[str] = ""

@classmethod
def _get_sink_type(cls):
return "jira"


class JiraSinkConfigWrapper(SinkConfigBase):
jira_sink: JiraSinkParams
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/kafka/kafka_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ class KafkaSinkParams(SinkBaseParams):
topic: str
auth: dict = {}

@classmethod
def _get_sink_type(cls):
return "kafka"


class KafkaSinkConfigWrapper(SinkConfigBase):
kafka_sink: KafkaSinkParams
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/mail/mail_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@
class MailSinkParams(SinkBaseParams):
mailto: str

@classmethod
def _get_sink_type(cls):
return "mail"

@validator("mailto")
def validate_mailto(cls, mailto):
# Make sure we only handle emails and exclude other schemes provided by apprise
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/mattermost/mattermost_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ class MattermostSinkParams(SinkBaseParams):
team: Optional[str]
team_id: Optional[str]

@classmethod
def _get_sink_type(cls):
return "mattermost"

@validator("url")
def set_http_schema(cls, url):
parsed_url = urlparse(url)
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/msteams/msteams_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
class MsTeamsSinkParams(SinkBaseParams):
webhook_url: str

@classmethod
def _get_sink_type(cls):
return "msteams"


class MsTeamsSinkConfigWrapper(SinkConfigBase):
ms_teams_sink: MsTeamsSinkParams
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/opsgenie/opsgenie_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ class OpsGenieSinkParams(SinkBaseParams):
tags: List[str] = []
host: Optional[str] = None # NOTE: If None, the default value will be used from opsgenie_sdk

@classmethod
def _get_sink_type(cls):
return "opsgenie"


class OpsGenieSinkConfigWrapper(SinkConfigBase):
opsgenie_sink: OpsGenieSinkParams
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/pagerduty/pagerduty_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
class PagerdutySinkParams(SinkBaseParams):
api_key: str

@classmethod
def _get_sink_type(cls):
return "pagerduty"


class PagerdutyConfigWrapper(SinkConfigBase):
pagerduty_sink: PagerdutySinkParams
Expand Down
5 changes: 5 additions & 0 deletions src/robusta/core/sinks/pushover/pushover_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ class PushoverSinkParams(SinkBaseParams):
device: str = None
pushover_url: str = "https://api.pushover.net/1/messages.json"

@classmethod
def _get_sink_type(cls):
return "pushover"


class PushoverSinkConfigWrapper(SinkConfigBase):
pushover_sink: PushoverSinkParams

Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/robusta/robusta_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ class RobustaSinkParams(SinkBaseParams):
ttl_hours: int = 4380 # Time before unactive cluster data is deleted. 6 Months default.
persist_events: bool = False

@classmethod
def _get_sink_type(cls):
return "robusta"


class RobustaSinkConfigWrapper(SinkConfigBase):
robusta_sink: RobustaSinkParams
Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/rocketchat/rocketchat_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ class RocketchatSinkParams(SinkBaseParams):
user_id: str
server_url: str

@classmethod
def _get_sink_type(cls):
return "rocketchat"

def get_rocketchat_channel(self) -> str:
return self.channel

Expand Down
4 changes: 4 additions & 0 deletions src/robusta/core/sinks/servicenow/servicenow_sink_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ class ServiceNowSinkParams(SinkBaseParams):
password: SecretStr
caller_id: Optional[str]

@classmethod
def _get_sink_type(cls):
return "servicenow"


class ServiceNowSinkConfigWrapper(SinkConfigBase):
service_now_sink: ServiceNowSinkParams
Expand Down
Loading

0 comments on commit 9665d9d

Please sign in to comment.