Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor hubot uptime and logs #8

Closed
patcon opened this issue Feb 20, 2014 · 4 comments
Closed

Monitor hubot uptime and logs #8

patcon opened this issue Feb 20, 2014 · 4 comments

Comments

@patcon
Copy link
Contributor

patcon commented Feb 20, 2014

Could use uptimerobot (free) to monitor an endpoint.

Could also use Loggly to monitor hubot logs using their heroku addon. (Also free)

Would be great to use this as an opportunity to start using a pagerduty workflow for gittip -- sending notifications when the hubot goes down. After all, if no one trusts when it will be available, they'll never want to depend on it :)

Psssst @m3matta :)

@patcon
Copy link
Contributor Author

patcon commented Feb 20, 2014

Tried free loggly addon, but it was pretty terrible, and there was no way to hook it up to notifications systems.

Set up a new uptimerobot account, and it's pinging a custom endpoint on our hubot every 5 minutes (fastest possible interval with their service, since it's polling). If I set up a pagerduty or opsgenie account, then I can have uptimerobot send an email to their alerting services.

Uptimerobot only allows a single user/email for login, so perhaps we could start a new alias that goes to a few people? maybe [email protected]? Otherwise, I can just keep these accounts that I'm testing under my email, but I'd rather spread the wealth :) cc: @whit537

@chadwhitacre
Copy link
Contributor

@patcon I've been using [email protected] for accounts like that. It's a shared account in SupportBee. I've added you there. :-)

@patcon
Copy link
Contributor Author

patcon commented Feb 21, 2014

Thanks! I like that approach. Changing it now.

@patcon patcon self-assigned this Feb 22, 2014
@patcon
Copy link
Contributor Author

patcon commented Feb 22, 2014

OK, officially have this working. Here's what we have:

  1. Uptimerobot is hitting the endpoint https://user:[email protected]/hubot/ping (which reminded me to open Start monitoring brute-force login attempts on roobot endpoint #10) every 5 min (minimum possible interval) and alerting if the reponse is not "PONG". 87e0dbb
  2. On alert (no PONG), uptimerobot is emailing our special inbox at pagerduty for the "roobot-prod" service.
  3. That service is raising an incident for any subject-line received from [email protected] that doesn't match /^Monitor is UP/ (since uptimerobot sends mail for both down and up, among other things).
  4. I'm the only one on the escalation policy for now (since roobot doesn't do much yet), but pagerduty will send me a msg via their app (or text, or phone call) if an incident happens. If anyone else wants on, just lemme know. It's not too exciting for now :)

So long story short, we're checking every 5 minutes that hubot is up, and I get an alert if he's not :)

Note: Realized we should eventually have a way to know if someone is trying to brute-force our hubot: #10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants