In January 2017 Mux announced support for error-rate alerts in our analytics service. One of our goals has been to calculate error-rate alert thresholds automatically. Fixed alerting thresholds are notoriously difficult to select and maintain over time.
We used statistical methods to calculate thresholds representative of highly-unusual error-rates, tailored to each error-type across all Mux customers. Our adaptive thresholds change over time as error types become more or less frequent.
But here’s the rub: things that are statistically unusual aren’t necessarily worth being alerted on. Many of the alerts were triggered at very low thresholds that didn’t affect many users and were not actionable. Discussions with customers confirmed our suspicions that we needed to improve our alerting algorithm to reduce alert fatigue.
We responded by using alert incident details from the last 7 months to train a binary classifier capable of identifying important alert conditions matching the characteristics of historical alerts that affected large numbers of viewers. All alerts created since mid-August 2017 make use of this feature. This has greatly reduced the volume of alerts and boosted the visibility of actionable & important alerts.
We encourage you to try the alerts feature in the Mux analytics service. Customer feedback is always appreciated, so please don’t hesitate to let us know how we might improve your experience!