-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add heartbeat/monitoring dashboard for inference system #88
Comments
Since we need to ultimately monitor across a range of different platforms, we will need a push-based system (as opposed to pull/scraped system like raw Prometheus). |
Hey @micya, noticed the Canadian Integrated Ocean Observing System is has an uptime monitor that is based on open source code https://github.com/upptime/upptime. It might not be able to help with the instances, but could help ensure we know when any of these sites are not available: |
Hey @micya -- Just noting a couple recent thoughts on possible tools, integrations, and/or data sources for an over-arching dashboard (i.e. maybe for not only the Azure-based realtime inference system, but the whole emerging ecosystem of Orcasound apps, APIs, and data layers):
|
A sub-feature of a CosmoDB read line chart that I would find interesting: Number of API requests from "outsiders" -- a possible metric for measuring the value of our open labeled to external collaborators, e.g. ML developers or bioacousticians. |
We (@xilin22 and I) looked into setting up Prometheus and Grafana for a health dashboard, but determined Grafana doesn't allow individuals with personal accounts to access the Grafana dashboard without having a work or school account. (See following error:) We are now looking into using Azure Workbooks for data visualization instead, which is newer and may solve some of the pain points that were called out in 2022. |
As for the alerting, we can add more azure functions to monitor service and resource health. Since Azure Managed Grafana does not allow personal accounts to login into Azure Managed Grafana instance |
@micya @scottveirs We may be able to get Azure Managed Grafana to work if we create our own organizational domain. It might be worth a shot if there is little to no cost in creating one. Maybe then Azure won't view it as personal account. |
We already have an organization. If you create a user in our AAD tenant, that should work. Though we would then need to track the username/password for the new user. |
That makes sense. I dont have permissions to create one. Maybe either you @micya and @scottveirs can create one and send me the credentials? |
@xilin22 - granted "User Administrator" on AAD tenant. Let me know if that doesn't work. |
There's a few thoughts on computed latency KPIs that could be valuable in a high-level heartbeat dashboard here -- #157 |
Additional thoughts in orcasound/orcanode-monitor#19 (comment) |
Historically, troubleshooting for inference system/notification system failures involved manual steps to identify failures. Past hackathon focused on utilizing Azure Dashboards to surface some metrics from Log Analytics. However, Azure Dashboards is difficult for non-technical observers to use.
I'd like to look into setting up something separate from Azure for monitoring purposes. It can either be a self-developed application or an existing monitoring solution (prometheus?). It should show at minimum:
The text was updated successfully, but these errors were encountered: