-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource monitoring and alerts #176
Conversation
This should reduce size and sent over network
Sorry, hit merge a moment too late. We need tests. |
} | ||
}) | ||
} catch (err) { | ||
response.err = err.message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do these cases get handled by the sampleBuffer. As far as I can see, there's no filtering of them, so it will try to generate averages of the err
property...
Given we know a locked-up node-red process will generate errors, this is a scenario we have to handle well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the sample has an error it is not added to the average, which will skew the avg down.
Will think about that some more
const result = {} | ||
samples.forEach(sample => { | ||
for (const [key, value] of Object.entries(sample)) { | ||
if (key !== 'ts' && key !== 'err') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we skip samples with errors. which is going to skew the average down. We could remove 1 from the sample count for each skipped sample, which given the current sample average period is a lot longer than unhealthy time out this should work out.
|
||
avgLastX (x) { | ||
const samples = this.lastX(x) | ||
const result = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const result = {} | |
const result = {} | |
let skipped=0 |
} | ||
} | ||
}) | ||
for (const [key, value] of Object.entries(result)) { | ||
result[key] = value/samples.length | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
} | |
}) | |
for (const [key, value] of Object.entries(result)) { | |
result[key] = value/samples.length | |
} | |
} else { | |
skipped++ | |
} | |
} | |
}) | |
for (const [key, value] of Object.entries(result)) { | |
result[key] = value/(samples.length-skipped) | |
} |
part of FlowFuse/flowfuse#2755
Description
Adds a NR plugin to expose a prometheus metrics endpoint
Then has the nr-launcher scrape that endpoint to generate resource usage stats.
Will generate alerts if thresholds passed.
Related Issue(s)
FlowFuse/flowfuse#2755
Checklist
flowforge.yml
?FlowFuse/helm
to update ConfigMap TemplateFlowFuse/CloudProject
to update values for Staging/ProductionLabels
backport
labelarea:migration
label