-
Notifications
You must be signed in to change notification settings - Fork 5
Statistical Alerting
David Hoelzer edited this page Mar 12, 2015
·
4 revisions
This job takes advantage of some monkey patching we've done to the Math class to add mean, variance and standard deviation functions. Leveraging these, event rate functions are provided in the models for Systems in addition to some Statistics logging that happens in its own model type. Based on all of this we can produce alerts when the event rate goes "out of range"
hosts = System.where(:monitor => true)
hosts.each do |host|
events_last_hour = host.events_since(1.hour.ago)
event_stats = host.hourly_stats
average = event_stats[0]
standard_deviation = event_stats[2]
if events_last_hour > (average + (standard_deviation * 2.5))
Alert.genericAlert(system_id: host.id, description: "#{host.display_name}: Unusually high event rate detected! Expected something less than #{average + standard_deviation * 2} but detected #{events_last_hour}.", short_description: "#{host.display_name}: Unusually high event rate!", criticality: 3)
elsif events_last_hour > (average + standard_deviation*1.5)
Alert.genericAlert(system_id: host.id, description: "#{host.display_name}: Somewhat high event rate detected! Expected something less than #{average + standard_deviation} but detected #{events_last_hour}.", short_description: "#{host.display_name}: Somewhat high event rate", criticality: 1)
elsif events_last_hour < (average - (standard_deviation * 2.5))
Alert.genericAlert(system_id: host.id, description: "#{host.display_name}: Unusually low event rate detected! Expected something more than #{average - standard_deviation * 2} but detected #{events_last_hour}.", short_description: "#{host.display_name}: Unusually low event rate!", criticality: 3)
elsif events_last_hour < (average - standard_deviation*1.5)
Alert.genericAlert(system_id: host.id, description: "#{host.display_name}: Somewhat low alerting rate detected! Expected something more than #{average - standard_deviation} but detected #{events_last_hour}.", short_description: "#{host.display_name}: Somewhat low event rate", criticality: 1)`
end
if events_last_hour == 0
Alert.genericAlert(system_id: host.id, description: "#{host.display_name}: No events logged recently! Expected something near #{average} but detected #{events_last_hour} in the past hour.", short_description: "#{host.display_name}: Log server offline??", criticality: 5)
end
end