Maintaining Uptime with a Log Management Platform
Analyzing log data through a log management platform is a great way to maintain uptime of critical systems. At Jungle Disk, we recently migrated away from our previous log management platform and have begun using a new system called Logentries.
Logentries has many of the same features of our previous log management platform, however, it differentiated itself in a few major ways. To start, Logentries recognizes key:value pairs in logs. This means that if you can get a JSON output from your apps, you can take that and specifically search for red flags. If we identify a “level”:”fatal” error on any disks being deleted, we can quickly and easily use a compound where clause to identify the name of the object(s) which had an error and the reason. If your app cannot provide you a login JSON, Logentries also supports regex.
The most frequent use we see is when used in conjunction with their “Alerts” system. Logentries has three different types of alerts including Basic Tag, Anomaly and Inactivity.
Basic Tag Alerts give you the ability to set alerts based on a matched pattern you specify and generate an immediate alert if the pattern is matched or matched a specific number of times in a specific time period.
Anomaly Alerts takes a matched pattern, however, it gives you a scope check and off-set. The scope is the period to check for occurrences of your matched pattern whereas offset is the period you are using as your comparator. For instance, if (status=500) occurs 50 more times than it did in the period of an hour last week, you can receive an alert to warn you of this anomalous behavior.
Inactivity Alerts are exactly as they sound, if you don’t see any new logs within a period you specify, the inactivity will trigger an alert. I find this least useful as you will likely have other methods to tell you if something is wrong rather than a log file not updating, however, it can’t hurt.
Alerts generated in Logentries can be sent to e-mail. I don’t find e-mail to be a method that helps you be proactive about alerts. We already use PagerDuty as a method of notifying us of any issues with servers and, fortunately, Logentries hooks right in. Now, any alerts we receive will give us a call letting us know an alert was triggered. If you don’t use PagerDuty, you can receive that alert from a few other methods such as Campfire, HipChat, a custom webhook or Slack. Being able to get alerts in this way from logs really helps our team stay on top of what’s going on in our environment.
Logentries ends up being much more than a log aggregator and filter for our team given its powerful alerting capabilities and integrations. We can see it as a crucial part in helping maintain uptime for our business.