-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ADR template and metrics in testing ADR. #152
base: main
Are you sure you want to change the base?
Conversation
This commit adds a template for our Architecture Decision Records, using the same format as Firefox Accounts. It's more structured than Nygard's status-context-decision-consequences template, but explicitly enumerates the alternatives considered and the pros and cons of each, which has been helpful in other projects (FxA, Application Services, UniFFI) that use this format.
|
||
### D. Subclass `aiodogstatsd.Client` to suppress sending metrics to StatsD in tests. | ||
|
||
This option would change `metrics.Client` to be a subclass of `aiodogstatsd.Client`, instead of proxying calls to it. The subclass would override the `{gauge, increment, decrement, histogram, distribution, timing}()` methods to log metrics if the f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, looks like the paragraph was truncated or unfinished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😳 Oops, thanks for catching that!
|
||
## Decision Outcome | ||
|
||
TODO: Fill me in once we decide! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we identified any winner(s) or favorite(s) yet? Or we will fill that in at a later time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another related question: are we using the PR as a discussion where we come to a decision? or is the intent that we have a discussion and decision made elsewhere, and this PR is purely for recording the already made decision? The reason I ask this is because the way I might go about reviewing such a PR will be different based on whether I am meant to discuss and sway a decision, vs a review on its quality as technical documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second this. It's a little disorienting between the RFC list and an ADR PR. I'm fine either way, but ADR in my mind is a decision to be recorded whereas an RFC is an idea to be discussed. We should make whatever decision consistent across App Svcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great question—I'm curious what y'all think! 😄 Reading our RFC process guidelines, I got the impression that it's geared toward significant changes (new features, cross-team projects, design reviews, substantial refactors) seeking feedback from multiple stakeholders, and I wasn't sure that this counted as either.
The line between the RFC list and ADR PRs feels a little fuzzy to me, too—for example, @mhammond (hi Mark! 👋🏼) just put up mozilla/application-services#5302, which is a draft ADR with a chosen solution sent to the RFC list for feedback.
For this one specifically, @hackebrot and I started a discussion on the Jira ticket to talk through some of these options, and we're meeting this week to chat more about them and make a decision. After that, we can update the PR with the outcome. Do we feel like that's a good approach for these kinds of smaller discussions, or would going through the RFC process be valuable—for example, if other folks want to chime in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a bit messy - it's a classic "if the only tool you have is a hammer, everything looks like a nail". I did an ADR because app-services already has ADRs, but doesn't really have RFCs, and I didn't really want to invent a separate RFC process just for this. The ADR template is also a little too prescriptive IMO and doesn't really suit all such discussions, so my ADR kinda ignored the template a little. It was sent to the RFC list because that was suggested by karloff as the appropriate place to reach the other stakeholders for remote-settings in particular.
So yeah, my ADR was probably closer to an RFC :) I don't really think the distinction is particularly useful - in practice, someone writing an RFC intends making a decision, just like an ADR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the feedback! What I'm understanding from what has been mentioned is that we have a way to solicit feedback on an idea (the RFC process), and that we can link the ADR PR for discussion or some other document. For this particular PR, discussion on the decision is being made in JIRA so this ADR is not the place to discuss the decision. I think this process is fine, and I don't see a strong reason to change it. I think that ADR reviews tend to solicit another round of evaluating the decision, so knowing the process for decision making is important for not delaying actually having a decision made!
docs/adr/template.md
Outdated
## Decision Drivers <!-- optional --> | ||
|
||
* [driver 1, e.g., a force, facing concern, …] | ||
* [driver 2, e.g., a force, facing concern, …] | ||
* … <!-- numbers of drivers can vary --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decision drivers can sometimes be ranked, as they aren't all equally as important. I'm wondering if there's a way we can capture it here. 🤔 (or maybe we can have it justified in Decision Outcome
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, yes—maybe we can represent this as a numbered list, and expand the blurb in the outcome section about the relative importance of the decision drivers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah a simple numbered list sound like a good start! I was originally thinking of maybe adding some sort of ranking system where you add a number (from 1-5, for instance) next to each point to indicate importance, but that might be more complicated than it's worth. We can change this later if we find that a numbered list doesn't exactly gives us the nuance that we need for the ranking.
docs/adr/template.md
Outdated
* Good, because [argument a] | ||
* Good, because [argument b] | ||
* Bad, because [argument c] | ||
* … <!-- numbers of pros and cons can vary --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference nit: I would prefer that we format this with separate Pro/Cons list.
#### Pros:
* [argument a]
* [argument b]
#### Cons:
* [argument c]
I think the titles and visual separation helps me with scanning the document (quicker to find what I'm looking for). But if there's a standard that we're trying to adhere to, then that can override my personal preference!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it! I copy-pasted this template from FxA/UniFFI, but think it's more important to have a structure that works for us, and I totally agree that it makes scanning a lot easier—and we can inspire changes the other way, too! 🚀
|
||
## Context and Problem Statement | ||
|
||
merino-py emits counts, histograms, and timing metrics as it runs. In production, these metrics are sent to Datadog, using an extension of the [StatsD](https://github.com/statsd/statsd) protocol called [DogStatsD](https://docs.datadoghq.com/developers/dogstatsd). For local development, merino-py provides the `metrics.dev_logger` config option, which logs outgoing metrics to the console instead of sending them to Datadog. merino-py also has integration tests that cover recording and emitting metrics. How can we avoid having the `metrics.dev_logger` setting and integration tests rely on implementation details of the `aiodogstatsd` client library to suppress reporting to Datadog? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the documentation more maintainable, I would suggest breaking down paragraphs with semantic linefeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, I like that a lot—thank you!
Thanks for starting this! 🌟 The first ADR is definitely an interesting topic! |
* Rank the decision drivers in order of importance. * Separate sections for the pros and cons of each option.
* Conform to ADR template changes. * Use semantic linefeeds for maintainability. * Fix truncated description for option D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @linabutler! 👋🏻
Thank you for creating this ADR! 🚀
I've added a comment about the data flow of metrics in merino-py.
## Context and Problem Statement | ||
|
||
merino-py emits counts, histograms, and timing metrics as it runs. | ||
In production, these metrics are sent to Datadog, using an extension of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current understanding is that we do not send merino-py metrics to Datadog.
The data flow is as follows:
- merino-py uses aiodogstatsd to record and submit StatsD metrics
- the aiodogstatsd client is configured to send metrics to a telegraf agent
- the telegraf agent uses a StatsD Input and InfluxDB Output
- the telegraf agent sends metrics to InfluxDB
- Grafana queries InfluxDB and visualizes metrics
Does that sound accurate to you, @dlactin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, we use the aiodogstatd package because it supports the Datadog extensions to the StatsD protocol which are enabled on telegraf.
In production, these metrics are sent to Datadog, using an extension of the | ||
[StatsD](https://github.com/statsd/statsd) protocol called [DogStatsD](https://docs.datadoghq.com/developers/dogstatsd). | ||
For local development, merino-py provides the `metrics.dev_logger` config option, | ||
which logs outgoing metrics to the console instead of sending them to Datadog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
which logs outgoing metrics to the console instead of sending them to Datadog. | ||
merino-py also has integration tests that cover recording and emitting metrics. | ||
How can we avoid relying on implementation details of the `aiodogstatsd` client library | ||
to suppress reporting to Datadog? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to suppress reporting to Datadog? | |
to suppress reporting to InfluxDB? |
Rendered view
This PR: