Continuous development and improvement is a big part of our monitoring strategy at BlueDot, and our experience monitoring thousands of servers supporting hundreds of technologies across many industries give us experience and insight necessary to provide unparalleled monitoring services for our clients.
One significant vertical we support is manufacturing, providing managed services to 24x7 manufacturing clients who rely on ERP systems to build and ship their products. Any downtime will result in the delay of product going out the door and every hour of downtime can result in tens of thousands of dollars in lost revenue.
Because of this, it is extremely important we are notified when services fail and are able to recover those services quickly. Modern ERP systems typically consist of web, application, and database tiers. Complex ERP systems, like Oracle EBS and SAP, often have many applications in different tiers working together to provide ERP, CRM, batch processing, and workflow services, all of which are required to be performing properly and quickly in order to provide the complete service.
As part of our managed IT services, we monitor everything under support, from Wi-Fi access point bandwidth to logging into your application to ensure it's performing well and no errors are encountered. We do this with Zabbix, an enterprise open source monitoring system that offers completely customizable service checks, escalations, and recovery options.
When a service fails on a Windows, Linux, or Unix server, our monitoring detects it and we can automatically restart the service from the Zabbix agent. If the service recovers, then the downtime experienced by the failure is minimal, typically less than one minute. If it does not recover, then the escalations service in Zabbix will notify our engineers and open a ticket letting us know the restart was unable to recover the service. If that ticket is not updated in 15 minutes, our standard critical response time SLA, then notifications are sent to BlueDot's leadership.
There is a lot more we can do than just restart the service. We can remove temporary files to free up disk space, run SQL database queries, manage cluster fail-over, essentially anything you run on the command line to recover a failed service or resolve chronic performance issues can be managed automatically from Zabbix.
This monitoring, escalation, and recovery capability provides our clients with minimal downtime, insight into every valuable statistic in their application infrastructure, and the ability to properly forecast for growth.
Check out our video where we demonstrate the recovery of an Apache web service on a FreeBSD server at Zabbix Alert Escalations .
See also: BlueDot Becomes Zabbix Reseller