Datadog application monitoring #100

Charl1996 · 2025-01-14T08:27:47Z

This PR

adds datadog to the codebase
Sends a timed metric to datadog

The way it works is, when you're invoking statd from datadog it send the metric to the datadog agent running on the machine (assuming it's running), which in turn handles the authentication etc. Since the datadog agent is already present on the staging and production machines this should just work, but I'll update here what my findings are.

I've considered simply adapting HQ's implementation of the metrics, but decided against that because we really don't need all that code at this stage and can get away with a clearly much simpler approach.

The corresponding CCA Ansible change can be found here.

ajeety4 · 2025-01-15T07:05:07Z

superset_config.example.py

@@ -191,3 +191,5 @@ class CeleryConfig:

 USER_DOMAIN_ROLE_EXPIRY = 60 # minutes
 SKIP_DATASET_CHANGE_FOR_DOMAINS = []
+
+SERVER_ENVIRONMENT = 'changeme'  # staging, production, etc.


Assuming this would be set up by ansible ?

ajeety4

LGTM

mkangia · 2025-01-15T07:38:15Z

hq_superset/models.py

@@ -34,6 +35,10 @@ class DataSetChange:
    data: list[dict[str, Any]]

    def update_dataset(self):
+        with statsd.timed('cca.dataset_change.timer', tags=[self.data_source_id]):


I'm currently busy figuring that out, since this would potentially result in massive volumes.

I chat to Graham and he mentioned that the volumes should not be a problem, since DataDog is good at handling that. All metrics we gather here is sent to the datadog agent on the server which talks to our datadog account once every (I think) 15 seconds.

All in all I think we're good here.

mkangia · 2025-01-15T07:41:40Z

hq_superset/models.py

@@ -35,7 +35,8 @@ class DataSetChange:
    data: list[dict[str, Any]]

    def update_dataset(self):
-        with statsd.timed('cca.dataset_change.timer', tags=[SERVER_ENVIRONMENT, self.data_source_id]):
+        env_tag = f"env:{SERVER_ENVIRONMENT}"
+        with statsd.timed('cca.dataset_change.timer', tags=[env_tag, f"datasource:{self.data_source_id}"]):


can we add domain tag ? I guess that would be useful?

Will datasource show on the metric itself and env would be on the top in the dropdown? and then if we add domain it would also be on the top of the dashboard?

Adding the domain could be useful I suppose.

Will datasource show on the metric itself and env would be on the top in the dropdown?

What do you mean here?

I don't know how tags are used/available on datadog once they are added here in metrics.
For now, I have seen "env" or "domain" being available on top of a datadog dashboard (like here) and then additionally there are metrics available for different items within a graph (like here)
So, I was asking if the domain and data source ids will be available like so.

Ah, I see. I created a chart here to demonstrate that we can, yes.

(I'm not 100% sure if the timeseries is what we want long term, maybe the Top List chart works better?)

Love it, thank you!

timeseries seems okay from the little I know, it does give me the data I think I want.

mkangia · 2025-01-15T07:42:43Z

Just love it how straight forward this was and +1 on keeping things simple and to what we need.

mkangia · 2025-01-15T07:43:57Z

Will wait for your findings for dropping an approval.

Charl1996 · 2025-01-17T08:55:00Z

I just want to touch up one or two things before I'll open this for review later today.

Charl1996 · 2025-01-20T09:26:20Z

Datadog has been tested on staging and the metrics pulls through (see cca.dataset_change.timer in metrics explorer).

Regarding costs:
There's not a high cardinality with this particular metric (currently 33 DataSource repeaters on production, including test ones, which would spawn 33 custom metrics for the production env). This puts us well below the "$5-per-100-custom-metrics" margin.

I've also spoken to Graham and his sentiment is that we don't have to worry too much about the volume of this metric (ie. the amount of potential data points per custom metric), because the datadog agent handles that well.

Charl1996 added 4 commits January 13, 2025 14:25

Install datadog python package

5a108db

Time dataset change update

aad8c94

Add SERVER_ENVIRONMENT setting and send with datadog metric

489770e

Format tags

22598f8

Charl1996 requested review from kaapstorm, mkangia, zandre-eng and ajeety4 January 14, 2025 08:27

Merge branch 'master' into cs/SC-4106-add-dd-application-monitoring

eb0de9f

kaapstorm approved these changes Jan 14, 2025

View reviewed changes

ajeety4 reviewed Jan 15, 2025

View reviewed changes

ajeety4 approved these changes Jan 15, 2025

View reviewed changes

mkangia reviewed Jan 15, 2025

View reviewed changes

Add function to ensure env is present in metric tags

09e611f

Charl1996 marked this pull request as ready for review January 17, 2025 10:49

Update config

fb7505e

Charl1996 mentioned this pull request Jan 20, 2025

Add server env config for DataDog dimagi/commcare-analytics-ansible#48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datadog application monitoring #100

Datadog application monitoring #100

Charl1996 commented Jan 14, 2025 •

edited

Loading

ajeety4 Jan 15, 2025

mkangia Jan 15, 2025

ajeety4 left a comment

mkangia Jan 15, 2025

mkangia Jan 15, 2025

Charl1996 Jan 15, 2025

Charl1996 Jan 17, 2025

mkangia Jan 15, 2025

Charl1996 Jan 15, 2025

mkangia Jan 16, 2025

Charl1996 Jan 16, 2025

mkangia Jan 17, 2025

mkangia Jan 17, 2025

mkangia commented Jan 15, 2025

mkangia commented Jan 15, 2025

Charl1996 commented Jan 17, 2025 •

edited

Loading

Charl1996 commented Jan 20, 2025 •

edited

Loading

Datadog application monitoring #100

Are you sure you want to change the base?

Datadog application monitoring #100

Conversation

Charl1996 commented Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajeety4 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkangia commented Jan 15, 2025

mkangia commented Jan 15, 2025

Charl1996 commented Jan 17, 2025 • edited Loading

Charl1996 commented Jan 20, 2025 • edited Loading

Charl1996 commented Jan 14, 2025 •

edited

Loading

Charl1996 commented Jan 17, 2025 •

edited

Loading

Charl1996 commented Jan 20, 2025 •

edited

Loading