Some improvements to Grosskurth `redo`

Alan Grosskurth's 2007 implementation of Daniel J. Bernstein's redo package was targetted at the Bourne Shell and was slightly simplistic. This package comprises modified versions of Grosskurth's original scripts, with the following improvements:

The scripts now use the Bourne Again shell rather than the Bourne shell. This allowed the use of BASHisms such as the printf builtin and $() expansion.
The interpreter that runs .do files is not fixed to only the Bourne shell. .do script files must have a "hash-bang" at the start of the file, and must have execute permission (--x--x--x) turned on.
The MD5 data are now stored in the right place in the .redo database. As Grosskurth pointed out, he took the shortcut of recording each target's MD5 hash in the database entry for that target. This unfortunately means that an interrupted build loses information about which dependents remain built with the older revision of a rebuilt dependency. If A and B both depend from target T, then interrupting a build after T and B were both built would lose the information that A was built with an older T with a different MD5 hash value.

These modified scripts store each dependency's MD5 has value in the database entry for the dependent. Interrupting a build after T and B were both built retains the old MD5 hash for T in the database entry for A, which causes A to continue to be regarded as out of date with respect to T.
Both MD5 data and last modification timestamps are used. It's quite expensive to run md5sum over every target and source file in the dependency tree, just to check whether their hashes have changed. These modified scripts store the last modification timestamp as well and only bother to recalculate the MD5 hash for files whose last modification timestamps have changed, on the principle that, notwithstanding any intentional mucking around with timestamps by the user, if a file's last modification timestamp hasn't been changed, it hasn't been written to and its MD5 hash cannot thus have changed.

The scripts use stat from GNU fileutils for this.
Special files don't confuse things. With some judicious use of [ -f ] instead of [ -e ], these scripts avoid problems with targets that are symbolic links, unix domain pipes or sockets, or block or character device files. Such files do not have their MD5 hash taken (Imagine using redo for building /dev the old MAKEDEV way and MD5 hashing /dev/zero!) and are always taken to be "changed" when rebuilding.
The .redo database has less extraneous stuff. It turns out that the following are simply unnecessary:
- target.type files — An existing file that doesn't have a database entry is by definition a source file, as a target (even if it doesn't exist) will always have a database entry. There's no need to record anything in the database to mark source files. As a side-effect of this, source files outwith the working directory tree (such as, say, headers in /usr/include) don't end up having useless database entries.
- target.result files — This needlessly duplicates the uptodate information.

These modified shell scripts are not intended to provide the extra functionality of Avery Pennarun's Python scripts. Their target use is systems without Python, or without SQLite, and they provide the minimal necessary functionality for a working redo build system without extra things — not strictly necessary for the basic task of building — such as parallel building, complex search rules for .do files, debug trace capability, and redo-always.

© Copyright 2012 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp information is preserved.

Some improvements to Grosskurth redo

Some improvements to Grosskurth `redo`