-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datadog application monitoring #100
base: master
Are you sure you want to change the base?
Conversation
@@ -191,3 +191,5 @@ class CeleryConfig: | |||
|
|||
USER_DOMAIN_ROLE_EXPIRY = 60 # minutes | |||
SKIP_DATASET_CHANGE_FOR_DOMAINS = [] | |||
|
|||
SERVER_ENVIRONMENT = 'changeme' # staging, production, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this would be set up by ansible ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
hq_superset/models.py
Outdated
@@ -34,6 +35,10 @@ class DataSetChange: | |||
data: list[dict[str, Any]] | |||
|
|||
def update_dataset(self): | |||
with statsd.timed('cca.dataset_change.timer', tags=[self.data_source_id]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and 💰 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm currently busy figuring that out, since this would potentially result in massive volumes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chat to Graham and he mentioned that the volumes should not be a problem, since DataDog is good at handling that. All metrics we gather here is sent to the datadog agent on the server which talks to our datadog account once every (I think) 15 seconds.
All in all I think we're good here.
hq_superset/models.py
Outdated
@@ -35,7 +35,8 @@ class DataSetChange: | |||
data: list[dict[str, Any]] | |||
|
|||
def update_dataset(self): | |||
with statsd.timed('cca.dataset_change.timer', tags=[SERVER_ENVIRONMENT, self.data_source_id]): | |||
env_tag = f"env:{SERVER_ENVIRONMENT}" | |||
with statsd.timed('cca.dataset_change.timer', tags=[env_tag, f"datasource:{self.data_source_id}"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add domain tag ? I guess that would be useful?
Will datasource show on the metric itself and env would be on the top in the dropdown? and then if we add domain it would also be on the top of the dashboard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the domain could be useful I suppose.
Will datasource show on the metric itself and env would be on the top in the dropdown?
What do you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how tags are used/available on datadog once they are added here in metrics.
For now, I have seen "env" or "domain" being available on top of a datadog dashboard (like here) and then additionally there are metrics available for different items within a graph (like here)
So, I was asking if the domain and data source ids will be available like so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I created a chart here to demonstrate that we can, yes.
(I'm not 100% sure if the timeseries is what we want long term, maybe the Top List chart works better?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeseries seems okay from the little I know, it does give me the data I think I want.
Just love it how straight forward this was and +1 on keeping things simple and to what we need. |
Will wait for your findings for dropping an approval. |
I just want to touch up one or two things before I'll open this for review later today. |
Datadog has been tested on staging and the metrics pulls through (see cca.dataset_change.timer in metrics explorer). Regarding costs: I've also spoken to Graham and his sentiment is that we don't have to worry too much about the volume of this metric (ie. the amount of potential data points per custom metric), because the datadog agent handles that well. |
Ticket
This PR
datadog
to the codebaseThe way it works is, when you're invoking
statd
fromdatadog
it send the metric to thedatadog agent
running on the machine (assuming it's running), which in turn handles the authentication etc. Since the datadog agent is already present on thestaging
andproduction
machines this should just work, but I'll update here what my findings are.I've considered simply adapting HQ's implementation of the metrics, but decided against that because we really don't need all that code at this stage and can get away with a clearly much simpler approach.
The corresponding CCA Ansible change can be found here.