Fast-tracking server errors to a log aggregator on S3

by Morgan Shorter on Tue, 29 Jun 2021

Setting up log monitoring with DjaoDjin is now easier than ever thanks to the latest logging feature. Integrating alerts into your app configuration will be supplanted by a more cleanly separated plug-and-play design that handles log backups, log rotation, and monitoring for server errors without developers having to worry about setting up and testing email logins.

Until now, server errors in production, i.e., those generating a 500(+) response code, have triggered email alerts to our engineering team. This setup is effective and straightforward to implement in Django apps, but has some notable drawbacks:

  • It requires customers to do extra work integrating and maintaining email alerts directly in their apps and containers. This is a bit of an anti-pattern.
  • Our inbox has ballooned to a ridiculous amount of emails. Reading individual email alerts has reached the limit of it's scalability.
  • Once we get an alert, someone has to log into the server to get a current copy of the logs before we can start diagnosing the issue.

We decided we could do better.

Previously, logs coming from djaoapp and customers' apps were uploaded to our S3 bucket following a daily rotation schedule. With this patch, we've added a fast track for logs to be uploaded to S3 as soon as a server error occurs. Once uploaded, they can be automatically triaged by bots. We've kept the daily rotation in place for a separate set of analytics tools to identify trends over time and provide insights to guide our development efforts.

We needed this "fast track" feature to be flexible, and to not make more work for ourselves or our customers; it needed to "just work" even on existing systems, and it does. As long as a customer's app is properly forwarding logs to Docker, the new fast track tool will be able to monitor and upload it's logs appropriately; no email required.

Keeping in mind that our platform and tooling would evolve over time, we needed to make this logging tool as simple, portable, and generic as possible. It needs to be able to survive added features and OS migrations without causing headaches for us or our users. To this end, we implemented the whole thing in common Unix shell and logging utilities.

This article will give you a whirlwind tour of how the new feature works. Some details have been left out for the sake of brevity, so following along with the patch itself is recommended for those who want the full picture. If you're curious about implementing something like this yourself, I also recommend looking through our related articles Logrotate, S3 storage, and AWStats and Django, Gunicorn and Syslog-ng for some useful context.

The Big Picture

  • Docker reads data from file descriptors 1 and 2 of the apps in its containers, The docker daemon then sends that data to syslog-ng as syslog formatted log-lines.
  • A set of filter rules in syslog-ng separates the docker logs from all the other logs and puts them in a separate file. Another filter rule further filters that data, sending only logs that matches a `5XX` HTTP response code to a Unix pipe.
  • On the other end of that pipe is a one-line shell script that will call logrotate after a predetermined amount of time of not receiving data.
  • Logrotate calls a shell script that takes care of compression, timestamps, and uploading. Logrotate then reloads syslog-ng.
  • A custom SELinux module allows processes and signals to be shared between the syslog and logrotate domains.

In general, an effort was make to keep things simple, portable, and generic where possible.

Docker

Docker Community Edition currently supports three logging drivers: The native json-file, which is exactly what it sounds like, and comes with its own log management tools; journald, the systemd "system journal" logging behemoth, supporting arbitrary log formats, fast searching of its binary logs, compressed ramfs storage, and other neat tricks; and syslog, the venerable system logger daemon that has been a mainstay of Unix userland for well over thirty years.

Our priorities in choosing a logging driver were:

  • The logs had to be in human-readable plain text so we could audit them with our eyeballs. No fancy binary formats.
  • The logs had to be stored on-disk, so that in case the whole server goes down, we lose as little data as possible
  • Its configuration needed to be extensible, flexible, and obvious.

Previously, we had been using journald and relying on `docker logs` to get our logs when we needed them. Since that setup accomplished none of our new priorities, it was easy to eliminate.

json-file was an attractive option at first since it had Tier-1 support from Docker and reduced the number of dependencies for our application logging system. Unfortunately, the configuration options available would have forced us to create one clumsy hack on top of another to avoid data loss. We had to move on to our last option.

Normally, the right way to change the logging driver is by editing the daemon.json file, but for whatever reason, docker seemed to ignore all the settings I put into it. Since we couldn't get that to work, we made our changes in /etc/sysconfig/docker by adding --log-driver syslog to the OPTIONS variable. Attempts were made to add more granular logging configuration options to docker in this manner, but only the logging driver seemed to stick. Your mileage may vary.

It's worth mentioning here that docker provides an option to adjust how log streams are buffered. The default is to not buffer the stream at all, and that's what we chose. In particular, when dealing with errors, it is important to have as much relevant data as possible for diagnosis. If the logs are buffered, that data could end up split between a log file on disk, and log data stuck in the buffer. Some low-traffic applications might take so long to flush the buffer that it is no longer useful. Additionally, if the server goes down without flushing that buffer, the data in that buffer is lost forever.

Syslog-ng filters

Something I had to solve before I even had a chance to test any of my clever filters was figuring out which source to use. It turns out that docker wasn't using the TCP port I opened up (514), but is passing logs to... somewhere else. In any case I got it working by using source(s_sys) which includes system() and internal() sources. The syslog-ng dev team evidently realized that setting up sources by yourself would be a major pain, and so have included some extra cleverness in that area that is documented here.

As you can see from the below excerpt from /etc/syslog-ng/conf.d/docker.conf, the f_docker filter copies individual log entries into docker.log based on their syslog metadata, and the f_5xxERR-hook filter matches any logs containing HTTP response codes that indicate server errors (500 - 599) and pipes those logs to a shell script.

filter f_docker { program("docker") or tags("docker"); };
filter f_5xxERR-hook { filter(f_docker) and message("HTTP\/.{3,20}[[:space:]]5[[:digit:]]{2}[[:space:]]"); };

destination d_docker { file("/var/log/docker.log"); };
destination d_5xxERR-hook { program("/usr/local/bin/logrotatehook-500error.sh"); };

log { source(s_sys); filter(f_docker); destination(d_docker); };
log { source(s_sys); filter(f_5xxERR-hook); destination(d_5xxERR-hook); };

Readers who are particularly regex savvy might have noticed the conspicuously long gap--up to twenty characters--between matching "HTTP" and the accompanying response code. The reason for this is that we can't predict what customers will put in their docker containers, nor what data those containers might send back. Twenty characters might be overkill, but in this case, dealing with false positives is much less stressful than dealing with false negatives.

One-liner shell script

Now, on to the masterpiece. The shell script is a single line, not counting comments. It waits until the stream of data from syslog-ng goes quiet for at least 30 seconds, then calls logrotate, and finally sends any error messages that arise during its own execution to a log file.

Originally, I attempted to put the entire shell script in the string argument to the program() directive in syslog-ng's config file instead of calling a separate shell script. Mainly this was to avoid managing yet another code asset. This resulted in significant mental acrobatics to work with the constraints I put around myself.

Although the documentation for the program() directive is disappointingly vague in places, it does at least exist. Through scouring the documentation and Stack Overflow, it was discovered that program() just forks, but doesn't invoke a real shell.

That meant my dream of cramming my carefully crafted shell haiku into a config file would be impossible. Out of this frustrating exercise though, came a better, more efficient script than I would have otherwise written.

Lets break it down: right away it checks the _mtime_ of our stream of errors from syslog-ng and stores that as a Unix timestamp in a variable.

tt=`stat -c %Y -`;

Then it starts an infinite loop which contains two statements.

while :;do

This first imperative statement inside the loop contains a comparison to see if the stream is less than 30 seconds old, as well as assigning a null value to a variable. By using stat to check the mtime of the file descriptor for the stream instead of reading the timestamps in the stream itself, several syscalls are avoided, and the time and space complexity of the script is greatly reduced.

[ $((`date +%s` - `stat -c %Y -`)) -le 30 ] && nn="";

The second statement is a set of evaluations chained together with binary operators, which starts by evaluating the variable that was just set. This evaluation ensures that logrotate is only run once per error event, since we're stuck in an infinite loop.

The next two evaluations in the chain check to see if any data was sent on the stream yet, and if so, if the most recent data is more than 30 seconds old.

Thirty seconds was a somewhat arbitrary length of time that was chosen to allow cascading errors to end up on the same log rather than splitting logs smack in the middle and losing context.

If all of the evaluations return true, the variable that was defined way back in the beginning gets set to "1" to indicate that we have already run logrotate and should therefore skip the whole evaluation chain next time around. Last, we run logrotate with its verbose and force options. Force ensures that logrotate disregards its rotation schedule and does its job anyway, and verbose allows us to have a record of what happened in the event that it fails.

[ -z $nn ] && [ $tt != `stat -c %Y -` ] && [ $((`date +%s` - `stat -c %Y -`)) -ge 30 ] && \
  nn="1" && /sbin/logrotate -vf /etc/logrotate.d/docker;

To finish off the script, we redirect all the output from the script, including the output from logrotate, into a separate file for post-mortem in the event of some error in syslog-ng, logrotate, or the script itself:

sleep 1;done

Lastly, we record all the output from the script into a separate file for a post-mortem in the event of some error in syslog-ng, logrotate, or the script itself:

>>/var/log/logrotatehook-500error-sh.log 2>&1

Logrotate

Logrotate is pretty handy and ubiquitous, but has some shortcomings:

  • No support for monotonic timestamps in file names (seriously guys?) which breaks everything when you actually use the `dateext` directive. If you need monotonic timestamps, the compression and other builtin features are rendered useless. (During the course of revising this article, it came to my attention that Logrotate may support monotonic timestamps after all. I haven't had chance to test this yet, but am looking forward to switching over to Logrotate's built-in tools if and when feasible.)
  • Everything depends on matching filenames that you are almost guaranteed to need to change during rotation.
  • Limited options for rotation criteria with no way to specify a custom criteria.
  • No way to define an arbitrary naming convention without breaking everything.

If logrotate has a saving grace, it is the ability to open a shell and run arbitrary code for various hooks. In order to work around logrotate's deficiencies, we used those shell hooks to take matters into our own hands.

Postrotate

In Logrotate, S3 storage, and AWStats, we used the lastaction hook to upload rotated logs using dcopylogs to our S3 bucket. We've now moved that code into postrotate. In addition, postrotate now picks out the most recently rotated log file--which, in this case, should always be docker.log.1--and change its filename to include the mtime of the file in Unix format.

ROTATEDFILE=`ls -t /var/log/docker.log* | head -n3 | grep -v '\.log\$' | grep -v '\.gz\$' | head -n1`
TIMESTAMP=`stat -c %Y $ROTATEDFILE`
mv -nv $ROTATEDFILE ${ROTATEDFILE/\.log*/.log-$TIMESTAMP}

While that takes care of having proper timestamps, logrotate now has no idea how to find the file, as its name has changed. As a result, any of logrotate's builtin file operations from here on will not work. So, we use the standalone implementation of gzip to finish up the job.

/bin/gzip -v9 ${ROTATEDFILE/\.log*/.log-$TIMESTAMP}

Lastaction

The `lastaction` hook contains a single line which is interesting in its own right, but before we dive into that, lets rewind for a second. We moved our compression and upload code from `lastaction` to `postrotate`, and did just the opposite for the code that restarts syslog-ng. Why?

As I was testing this feature to squash all those pesky bugs, I kept running into an error that eluded all of our attempts to collect data on it: sometimes logrotate just didn't finish what it was doing, but there was no clear pattern to it.

Finally, I remembered that syslog-ng was still the parent process, and anything that kills syslog-ng would also kill logrotate without warning. I moved the code that restarted syslog-ng into `lastaction` and the issue went away.

Now on to that interesting line of code:

/bin/sh -c 'syslog-ng-ctl reopen 2>/dev/null || kill -HUP `pgrep syslog-ng 2>/dev/null` 2>/dev/null || true'

First, it attempts to run syslog-ng's builtin reopen command. Failing that, it will do things the old fashioned way with a hangup signal.

Using `syslog-ng-ctl reopen` is the fastest, safest way to get syslog-ng to write data to the _new_ `docker.log` that logrotate created for us. It also reloads all if its configuration files in-place and restarts child processes, which conveniently includes our shell script.

Regrettably, on some operating systems that have truly glacial release cycles, syslog-ng-ctl reload might not be available yet as it was added upstream in 2018. To some, that may sound like an absurdly long amount of time to not update your packages, but as an avid Slackware user I can assure you that it is not hypothetical.

This would be unimportant were it not for a memory leak present in syslog-ng. When syslog-ng receives a hangup signal (SIGHUP), it reloads. Somewhere along the way though, it forgets to free some memory. The amount of memory leaked by older versions of syslog-ng is significant enough to cause production systems to hang in some edge cases. Today, after much bug fixing, the maintainers now have this memory leak down to a respectable 10KB, but that's still 10KB per reload which I'm happy to avoid if I can help it.

In any event, should the neat and tidy syslog-ng-ctl reopen solution fail or be unavailable for whatever reason, we can rely on kill to get the job done.

SELinux

If you're running a setup like what I've described in this article for the first time, it might be right about now that you run into an issue: logrotate works fine when you call it from a real user shell, but does absolutely nothing when called from syslog-ng's child process. If that is happening, SELinux may be to blame.

To test if this is the case, try the following as root:

ausearch -ts today -i | grep logrotate | tail -n 100 | audit2why

If that returns error messages that say something like `avc: denied { execute }`, SELinux is preventing the constituent parts from working together. The easiest way to remedy this is apply the SELinux module we created for this specific application, or modify it to fit your needs. If you get `command not found` errors, you'll need to install the packages that contain the programs in the above statement, then try running the commands again.

SELinux policy (`.te`) files tend to be fairly self-descriptive, so rather than pick it apart line by line, I'll just give you the whole file:

module syslog-domaintrans-logrotate 2.0;

require {
	type syslogd_t;
	type logrotate_exec_t;
	type logrotate_t;
	class process { transition signal sigchld };
	class file { execute getattr open read };
	class fifo_file { read ioctl getattr};
}

#============= syslogd_t ==============

allow syslogd_t logrotate_t:process { transition signal };
allow syslogd_t logrotate_exec_t:file { execute getattr open read };
type_transition syslogd_t logrotate_exec_t : process logrotate_t;

#============= logrotate_t ==============
allow logrotate_t syslogd_t:process sigchld;
allow logrotate_t syslogd_t:fifo_file { getattr read ioctl };

In SELinux, all files belong to domains, which are sets of rules governing what those files are allowed to do in various circumstances. In this module we use permissions in the process class to enable the syslog-ng and logrotate domains to start, and send signals to, each-other's processes. We use the file class to allow both domains to read and execute one anothers' binary and config files. And finally, we use the fifo_file class to enable the domains to use Unix pipes to communicate.

And there you have it! By leveraging common system administration tools, we now have a plug-and-play fast track for logs to be backed up, analyzed, and reported in the event of a server error.

More to read

You might also like to read:

More technical posts are also available on the DjaoDjin blog, as well as business lessons we learned running a SaaS application hosting platform.

by Morgan Shorter on Tue, 29 Jun 2021


Bring fully-featured SaaS products to production faster.

Follow us on