logrotate
or newsyslog
in this century.
logrotate
and newsyslog
are a design that should be consigned to history.
They are examples of how incremental changes to the way that things are done can have quite poor end results.
Having the log-rotater program separate from the log-writer program results in several unavoidable potential problems.
This design was superseded by better designs in the middle 1990s.
The original way of logging the output of a Unix program was very simple. One just redirects its standard output to a file, or through a pipe. (For ease of comprehension, these are in more modern shell language.)
my-program > myprogram.log my-program | tee myprogram.log
The invention of standard error for Sixth Edition slightly complicates this, but not by very much.
my-program > myprogram.log 2>&1 my-program 2>&1 | tee myprogram.log
The problem is that this file grows forever.
There's nothing to be gained from deleting it; Unix keeps the i-node around because the program's standard output is pointed to it, and so one just ends up with an inaccessible deleted file with a zero link count.
There's little to be gained from truncating it, even, as the file pointer for the output file description is not repositioned to the beginning of the file and one ends up with a (sparse) file that has a large hole of zero(-fill)ed blocks at the start.
A bunch of NUL
s is not pleasant for Unix text processing programs written in the C language.
Hence the arrival of newsyslog
in the BSD world, and logrotate
in the GNU world.
These programs are run by some form of periodic scheduler, such as cron
.
At regular intervals, the log files on the system are checked to see whether they have exceeded some size limit, and "rotated" if they have.
Rotation involves keeping a number of generations of old log files, deleting older generations as necessary to make way for newer ones.
This "rotation" can take two forms.
The current file can be truncated in place, in which case the log-writing program needs to have its standard output in always-append mode. Before truncation, the log-rotating program copies the current file to a new file that is the first generation archive.
my-program >> myprogram.log 2>&1 my-program 2>&1 | tee -a myprogram.log
The current file file can be renamed to the first generation archive file, and the log-rotating program then tells the log-writing program to create and to switch over to a fresh current log file.
To the naïve observer it appears that everything is good; logs do not grow indefinitely, their previous generations are limited, truncation does not result in (sparse file) NUL
-filled "holes", and it even seems that this dovetails nicely with existing tools.
It appears that this ties up nicely with two "follow" modes of the tail
command, which can therefore be used to monitor such logs as they are written to.
tail -f
tail
watches a single, open, file for size changes.
It can either grow larger by having fresh log entries appended, or reset to zero when truncated.
This would seem to exactly match the copy-and-truncate-in-place form of log rotation.
tail -F
tail
does everything as for -f
, but also watches for the file being renamed or deleted.
If it is, it attempts to re-open the current log file afresh, opening the new file that is now at that name.
This would seem to exactly match the rename-away-and-create-anew form of log rotation.
However, this analysis is superficial; and this is not in fact the case.
There are two classes of problems with this.
Because the log-rotator is separate from the log-writer there is a window between the log-writer exceeding the size limit of a log file and the log-rotator's next scheduled run. Even were the log-rotator run every minute, which is far more frequently than these programs are run in the wild, that's enough, at modern computer speeds, for a runaway log writer to write gigabinarybytes of log entries and blast right through the size limits on log files that the log-rotator is supposed to be enforcing.
A problem in the same class is that in the copy-and-truncate-in-place model, the copy is racing against the log-writer that is still writing to the log as it is being copied. The copier cannot reliably know when it has read all of the way to the end of the file, and there is an unavoidable window between the copy being considered done and the log being truncated where the log-writer can butt in and append more log entries, that are never copied and in fact immediately lost from the log. (Rainer Gerhards, author of rsyslog, once called this "a very, very, very bad choice" that is "inherently racy and using it is almost a guarantee that you will lose log data".)
tail -F
cannot deal with all eventualities
Whatever tail -F
does, it cannot cope with all of the possible things that can happen.
As a result, it ends up skipping log entries, and possibly even entire log files.
(Bryan Cantrill gave a talk about BSD tail
and log files at Surge 2013, but he is far from the only person to have hit these problems; and what he explains about BSD tail
holds true for GNU tail
as well, which gives up, with comments about this in the code, handling the problem situations for much the same reason that Bryan Cantrill did.)
If the log-writer is pumping out data hard enough, which it all too often is in situations where one needs log file processing to be at its most robust, the system can get into a state where tail -F
is more than one generation of log file behind.
But tail -F
doesn't know about the log file generations; it only knows the primary log file name.
So when it has finished reading the older generation log file, it skips the next older one and goes straight back to the current file.
Partway through a rename-away-and-create-anew rotation, there is a window where the system is in the state where the file has been renamed by the log-rotator, but the log-writer has not yet been informed of this or acted upon that information.
The way that log-rotators tell log-writers to switch over is messy and unreliable, usually involving sending signals using PID files.
But setting the dreadful mess that is the idea of PID files aside, what happens is that the log-writer keeps writing to the old file, now renamed, but not necessarily immediately.
In the meantime, the log-rotator has created a new current log file, and tail -F
has noticed the rename, but because the timing was just exactly unfortunate, missed the further log entries that the log-writer wrote after a little while into the old log file, and instead already switched to the new current file, well ahead of the log-writer doing so.
This problem is actually called out in a comment in the code for GNU tail
.
If the file is truncated and then immediately overwritten with an equal amount of data, tail -F
's truncation detection code, which checks for the size decreasing, will not consider this a truncation and not re-read the file from the beginning.
This can happen when (say) there was only one log entry in the file and it is truncated and a single new log entry immediately added.
Ironically, the very operation of tools like logrotate
and newsyslog
can make this happen.
When newsyslog
rotates a file, it adds a new log entry to the top of the current log saying that newsyslog
rotated the log.
If nothing is written to the new log and a forced rotation is triggered, it does this again; truncating the log and then immediately writing out a message that is pretty much the same size as the one before.
Get the timing of tail -F
right, and it does not notice.
Worse, if it is overwritten with a greater amount of data, tail -F
does not think that a truncation has occurred at all, and keeps going from its current position in the file.
This may now be partway through a log entry, and it of course means that tail -F
ignores a whole lot of new log entries.
This problem is also called out in a comment in the code for GNU tail
.
These problems were addressed in the mid 1990s, by going back to the original mechanism with the following principle in mind:
The log rotation and the log writing must be done by the same code, so that it knows exactly when to do a rotation, does it exactly then, and does not race with a parallel process doing the other half of the task.
One can see this in the designs of the "next generation" syslog utilities that came about a few years later, which made a point of advertising that they don't need external log rotation tools. But one can also see this in a design also from the mid 1990s.
It is important to realize that Doug McIlroy's "garden hosepipe" design still applies.
The log rotation and log writing code do not have to be in the program generating the log entries.
They can be attached to that program, with an ordinary Unix pipe, just like tee
was further up this page.
The realization then dawns that what is simply needed is (in another instance of Doug McIlroy's "garden hosepipe" metaphor) a better tee
program.
And this is what came about with Daniel J. Bernstein's daemontools, which began with a program named cyclog
and progressed to a program named multilog
.
There is now a whole suite of logging tools in the daemontools family of toolsets from which one can select according to one's needs, from cyclog
to s6-log
.
Apart from one that didn't have a size limit mechanism, these all share the core semantics of being "tee
plus".
They write to one or more files; but they monitor the sizes of the files that they have written, and before they would exceed a size limit they stop writing and perform a rotation, switching to the fresh current file immediately and before recommencing writing, once the rotation is complete.
(For rotating by date, rather than in order to impose a cap on overall log file disc space consumption, there is also Andrew Ford's cronolog
adapted from an Apache tool, which has the same design of one program acting like tee
and doing both the writing and the rotating.)
This fixes most of the problems. (It has benefits in its own right, nothing to do with rotation, too. Some filesystems and storage devices give better results if files are only written to once, sequentially. And the notion that once renamed away a old generation log file is never supposed to be written to, can be both relied upon, by programs that can assume that old log files are static snapshots, and enforced, by programs that do things like sign logs.) It fixes all of the problems that are caused by the log-writer not being in lockstep with the log-rotator. It uses only rename-away-and-create-anew rotation, so there is no difficulty with having to accommodate the problems relating to copying and truncation.
Fixing tail -F
is slightly harder, because tail -F
is a general purpose program that does not understand the log rotation scheme.
To fix the problem with skipping over old log files when throughput is high, tail -F
would have to know the names of all of the old generation log files and scan through all of them (that it had not yet processed) in the correct order before switching back to processing the currently active log file again.
But tail -F
has no such hardwired knowledge, which is specialized to the one case of log files.
(This becomes clearer if one looks and the disparity between how one tells the likes of multilog
to write a log and how one tells tail -F
to read it.
multilog
is told the name of the entire log directory, and expected to understand its structure.
tail -F
is only told the name of one individual file in the log directory, and isn't even aware that there is a daemontools-style log directory.)
Thus came about tools such as follow-log-directories
in the nosh toolset.
This does have particular knowledge of the log rotation system employed in daemontools-family loggers, and of TAI64N timestamps, and so can know where to look for old files that it needs to process before it can proceed with processing the current file.
(It is told about entire log directories, and expected to understand their structures.)
Prasad J. Pandit took Daniel J. Bernstein's djbdns in 2012 and modified it into a tool suite that xe named ndjbdns.
One of the first things that xe did was to remove the use of multilog
and go back to using a separate log rotator.
Obviously enough, this retrogressive step, that undid nearly two decades of learning better than to have a separate log rotator, brought back the aforementioned problems; as M. Pandit eventually discovered a year or so later.
This was never fixed, though, and remains a problem in ndjbdns.