redo
redo
is a package originally designed by Daniel J. Bernstein in 2003.
Its purpose is to provide a build system for software packages that does incremental builds, i.e. if the package is built and then some of its source files are changed, the build system will only re-run that part of the build procedure that is necessary to re-build the changed parts of the package.
Although Bernstein apparently did write the tools, as traces of them can be found in the build systems of some of the packages that he did release, he never got around to properly packaging up and releasing them. Two people picked up the ball and ran with it, however:
Alan Grosskurth implemented the various commands in redo
as a set of Bourne Shell scripts, in 2007 for his university thesis Purely top-down software rebuilding.
The script code is included as an appendix to the thesis, and is also available on the World Wide Web as redo
, redo-ifcreate
, and redo-ifchange
.
In 2012 I improved Grosskurth's scripts, retargetting them at the Bourne Again shell and GNU fileutils.
Avery Pennarun implemented the various commands in redo
as a set of Python scripts, in 2010.
They are available on GitHub and can be installed via MacPorts.
Later in 2012 I implemented the various commands in redo
from the ground up, as compiled C++ programs that run on FreeBSD, PC-BSD, OpenBSD, and Debian Linux (and probably on all BSDs and Linux flavours).
In 2013, Chris Forno implemented parts of redo, in Haskell as a training exercise. It omitted redo-ifcreate
.
In 2014, Nils Dagsson Moskopp re-implemented Pennarun redo, retargetting it at the Almquist shell and BusyBox. (A number of non-POSIX shell constructs from the Bourne Again shell and from Debian's, BusyBox's, and other people's additions to the Almquist shell ended up in the scripts; which as of 2017 M. Moskopp is looking into eliminating.)
If you want to see some of the aforementioned traces, look at Bernstein's ptyget package, published in 1996.
It has .do
files that run (undocumented and unpublished) programs named dependon
, dependcc
, formake
, and directtarget
.
Whilst evidently, given formake
which seems to be a tool for generating Makefile
s a snippet at a time, this is not the redo system as envisaged five years later, dependon
is a clear precursor to redo-ifchange
.
I have a consolidation of Bernstein packages where not only have I replaced the .do
files for ptyget with onesthat invoke the redo tools as we know them, but I have also made all of the rest of the packages build with redo too.
Both the Grosskurth and Pennarun implementations share a set of fundamental concepts, from Bernstein's design:
Files in a project whose build is controlled by redo
are either source files or target files.
Together, they form a dependency graph.
Source files are always leaf nodes on the dependency graph. They depend from no other files. Attempts to "re-do" (i.e. build) them are either no-ops or errors.
Target files are either leaf or interior nodes on the dependency graph. They depend from zero or more other files, which may in their turn be either source files or further target files. It is target files that are "re-done" (i.e. built).
Target files may be out of date with respect to what they depend from.
This information is not stored in the files themselves.
Rather it is stored in a dependency database.
Bernstein specified that this database is named .redo
in the
root-level directory of the project being built. But he didn't give any more
detail.
In the Grosskurth implementation of the package, .redo
is a directory.
In the directory, each target file has a subdirectory.
Within that directory are zero or more files, with the file names being keys, and the file contents being the values for those keys.
In the Pennarun implementation of the package, .redo
is a directory, containing a SQLite database file, with all of the targets, keys and values in it, and zero or more (session-local) lock files used to avoid loops and to enable parallel building.
The commands to build a target from its dependencies are stored in a "do"
file. Either a target has its own individual "do" file, named the same as the
target filename with .do
appended, or it employs a default "do"
file, named default.
target-extension.do
.
Once redo
has decided that a target is out of date, it searches
for an appropriate "do" file to run and runs it.
"do" files are ordinary scripts, and can be written in any scripting language
as long as that language is capable of running two external commands:
redo-ifchange
and redo-ifcreate
.
Scripts are provided with the desired target filename that they are expected
to generate, and they are expected to both regenerate that output file and, at
the same time, create/update the target's entry in the dependency database
with the redo-ifchange
and redo-ifcreate
commands.
Targets can be out of date with respect to their dependencies in two ways, signalled by the following commands in the "do" files:
redo-ifchange dependency …
This adds an entry to the dependency database for the current target (the one being built by the "do" file) saying that if the existing file dependency changes, the target should be considered out of date. What constitutes "changing" varies:
In the Grosskurth implementation of the package, a file "changes" if it is
deleted or if the MD5 hash of its contents changes from what it was at the
time that redo-ifchange
was run.
In the Pennarun implementation of the package, a file "changes" if it is
deleted or if its timestamp changes from what it was at the
time that redo-ifchange
was run.
redo-ifcreate dependency …
This adds an entry to the dependency database for the current target (the one being built by the "do" file) saying that if the non-existent file dependency is created, the target should be considered out of date.
The system is recursive. When a "do" script executes
redo-ifchange
, the given dependencies are themselves processed.
redo-ifchange
uses the contents of the dependency database to
determine whether each dependency is itself out of date. If it is,
redo-ifchange
searches for and runs the relevant "do"
script. This in turn may result in further redo-ifchange
calls,
and further nested "do" scripts being run, for second level dependencies. And
so forth.
The top of the system is one command: redo
. This is like
redo-ifchange
with two exceptions. It assumes that all targets
are unconditionally out of date; and so skips querying the dependency database
and proceeds directly to running the "do" scripts. And it doesn't add
anything to the dependency database, because of course it isn't invoked within
a "do" script and so doesn't conceptually have a "current target" entry in the
database to refer to.
Several maxims have driven the design and the implementations of
redo
:
If a target does not exist, it is by definition out of date. It does not, therefore, matter what it is in its entry in the dependency database, or even whether it has an entry at all.
A "do" file can thus build both a target and its list of dependencies at
the same time, in a single pass.
It's not necessary, in redo
, to build the dependencies first and
then build the target in a second pass.
Moreover, the list of dependencies can be built after the target is
itself built.
(An example of this is compiling a C or C++ source file to make an object file
target. GCC's -MD
option can be used, and contents of the
resulting .d
file post-processed, after GCC has run,
and passed to redo-ifchange
.)
If a file exists but has no entry in the dependency database, it is by definition a source file. Building a target creates its entry in the dependency database at the same time. A target cannot therefore exist without a dependency database entry.
A target is only marked up to date if a "do" script completes successfully. The entry in the dependency database contains the result of the "do" script. If it was a failure, the target is considered not up to date.
Dependencies include every relevant existing/non-existent file.
This includes the "do" files themselves. The package handles this
automatically. When searching for what "do" file to run, after it has decided
that it needs to (re-)build an out-of-date target, every "do" file that is
searched for but not found is automatically added (as a dependency to the
target's entry in the dependency database) as if by
redo-ifcreate
, and the "do" file that (finally) is found and run
is automatically added as if by redo-ifchange
. Thus a target
becomes out of date if the "do" file used to build it is changed or deleted,
or if any higher-priority "do" file that would have taken precendence in the
search for "do" files is created.
"do" files are expected to do the same,
although in practice the redo-ifcreate
part is hard.
For example: If a C or C++ compiler is run to generate an object file target, the list of dependencies should include a redo-ifchange
for every header and source file that was included by an #include
directive in the source code, and furthermore should include a redo-ifcreate
for every place that a header or source file was searched for but not found.
However, no known C/C++ compiler at the time of writing provides support for telling a build system what files it depended from the non-existence of as it was compiling.
This makes C/C++ compilers an imperfect fit to the redo
way of working.
If an included header is superceded by a higher priority one, placed earlier on the compiler's include search path, redo
won't know to check for this, because the C/C++ compiler hasn't provided the information, and the target won't be rebuilt as it should be.
Since the redo
command is unconditional, and will always result in a "do" file being run, targets at the very top level of a build system are effectively pseudo-targets, and can be given the common pseudo-target names such as all
:
# all.do redo-ifchange example.exe
Note that there's little use for redo-ifcreate
in a
pseudo-target.
By making the commands for compiling and linking into scripts, importing the compiler options from short text files, one can ensure that whenever compiler options or the compiler command itself change, redo
will flag all appropriate targets as out of date and re-build them with the new compiler/compiler options:
# example.exe.do objects="example.o wibble.o" redo-ifchange ./link ${objects} ./link $3 ${objects}
#!/bin/sh -e # link redo-ifchange ./cxx ./cxxflags ./ldflags read -r cxx < ./cxx read -r cxxflags < ./cxxflags read -r ldflags < ./ldflags ${cxx} ${cxxflags} ${ldflags} -o "$@"
There are several possible variants of the above, of course. The
link
script could be part of the "do" script. The command
options could be embedded directly in the "do" script as well. The aforegiven
arrangement allows for fan-in: multiple target executables can share a single
link
script, and multiple link
scripts
can share a single cxx
file
(whilst, say, having different ldflags
files listing different
sets of link libraries for differing groups of linked executables).
Although, as previously mentioned, C/C++ compilers don't generate proper dependency information for files that they have relied upon the non-existence of, generation of the half of the dependency information relating to existing files is fairly trivial:
# default.o.do redo-ifchange ./compile $1.cpp ./convert-depend ./compile $3 $1.cpp ./convert-depend $3 $1.d | xargs redo-ifchange
#!/bin/sh -e # compile redo-ifchange ./cxx ./cxxflags ./cppflags read -r cxx < ./cxx read -r cxxflags < ./cxxflags read -r cppflags < ./cppflags ${cxx} ${cxxflags} ${cppflags} -MMD -c "$2" -o "$1"
#!/bin/sh -e # convert-depend exec sed -e "s/^$1://" -e "s/\\//g" "$2"
As noted earlier, no C or C++ compiler currently generates any redo-ifcreate
dependency information, only the redo-ifchange
dependency information.
This is a deficiency of the compilers rather than a deficiency of redo
, though.
That the introduction of a new higher-precedence header earlier on the include path will affect recompilation is a fact that almost all C/C++ build systems fail to account for.
I have written, but not yet released, a C++ tool that is capable of generating both redo-ifchange
information for included files and redo-ifcreate
information for the places where included files were searched for but didn't exist, and thus where adding new (different) included files would change the output:
/package/admin/nosh % /package/prog/cc/command/cpp --MMD build/system-control.cpp >/dev/null 2>&1 /package/admin/nosh % cat build/system-control.cpp.d redo-ifcreate utils.h fdutils.h service-manager-client.h runtime-dir.h home-dir.h popt.h redo-ifchange build/utils.h build/fdutils.h build/service-manager-client.h build/runtime-dir.h build/home-dir.h build/popt.h /package/admin/nosh %
redo
handles four common incremental changes to an already-built
system easily:
If a new file created since the build hides another lower priority one
on a search path list,
redo
detects this because the requirement of the
non-existence of the file has been recorded by
redo-ifcreate
during the previous build.
If compiler options are changed,
redo
detects this because the files containing those options
(the "do" script itself or a link
script or a
cxxflags
file as aforegiven) have been recorded by
redo-ifchange
during the previous build.
If a file found by wildcard expansion disappears,
redo
detects this because the file has been recorded by
redo-ifchange
, passed the list generated by wildcard expansion,
during the previous build.
If rebuilding generates exactly the same output,
redo
detects this because the MD5 hashes in its dependency
database do not change.
Like make
, redo
assumes a 1:M relationship between
targets and dependencies. And like make
, it is quite difficult
to deal with a project where one build instruction generates multiple targets
all together in one go. (An example of this is a parser generator, which
generates multiple .cpp
and header files from a single grammar
file.)
In addition to this problem of the source:target ratio, redo
further assumes a fairly basic arrangement of the "do" files and target files
with respect to the source file tree, that makes for good examples, but has
difficulties with more advanced strutures that occur in real world development
where things are more complex:
People sometimes have read-only source file trees. Sometimes they are held on CD-ROM, or mounted read-only, or held in common directories where only one user has write/create permission.
To use redo
with such a system, either one has a shedload of
symbolic links, which of course have to be themselves built somehow,
effectively mirroring the source file tree into the target file tree; or one
has a mechanism telling the build system that the source directory tree is
rooted in one place and the target file directory tree in another.
redo
has a chicken-and-egg problem with creating the symbolic
links (because a totally empty target tree doesn't have any "do" files) and
neither existing redo
implementation provides a mechanism for
separate source and target directory trees.
Furthermore, the symbolic links, which are considered target files in such a
system and so need not exist at any given point, create "dishonest
dependencies" in a redo
system. The .redo
dependency
database, and indeed the debugging information generated by a compiler, points
to what it thought were the source files, but which actually aren't.
Real world example: The build process for the Clang compiler. The source
resides in one tree, that is only ever read by the build process, not written,
and the makefiles (generated by autotools
) and target files reside
in a parallel, ../build
tree.
People sometimes have multiple active target file trees. One might want to build with multiple compilers, or for multiple target platforms and processor architectures, or even just for debug/release, in parallel.
redo
has severe difficulties with such a system. It has all of
the aforementioned problems of the symbolic link approach, with an
additional problem, layered on top of them to further complicate
matters, of needing different "do" files (or
compile
/link
scripts or even sets of
cxx
/cxxflags
/ldflags
files)
for each target compiler/platform/debug type.
It also needs a different .redo
database for each target
compiler/platform/debug type. On one system a target might depend from
/usr/include/w32api/windows.h
, whilst on another it might depend
from /usr/include/unistd.h
, and from
$watcom/h/nt/windows.h
on a third. Each of these dependencies is
active, simultaneously, for the same target file at the same place in the
target directory structure.
Confidentiality agreements make it seem like this is a rare situation to the naïve if-it-isn't-on-the-WWW-in-a-source-repository-it-doesn't-exist thinker. In the real world practically every cross-platform (e.g. Windows+Linux+MacOS+OS/2+…) commercial product has this requirement, as those who've done real world development of such products will attest.
What redo
specifically lacks is the concept of three
separate directory trees for source files, "do" files, and target files,
allowing multiplatform and multicompiler development, read-only source with
honest dependencies and debugging information that points to the right
filenames, and side-by-side debug/release builds.
Avery Pennarun's redo
implementation forks processes excessively
and duplicates the kernel's, and indeed the shell's, #!
processing.
This is an unfortunate byproduct of its assuming that "do" files are Bourne
shell scripts and only requiring #!
when they are not.
It passes all "do" scripts directly to the Bourne shell, and thus has
to check for #!
and do its own #!
processing
duplicating, not necessarily exactly or correctly, what the kernel does.
In particular, the #!
implementation in Pennarun
redo
as of 2012 has several of the well-known and widely documented
#!
security holes that have long since been closed in actual
operating system kernel implementations of the same mechanism, including the
script filename race condition. It also has an incorrect magic number check
and an additional PATH
security hole caused by how it
attempts to run the Bourne shell.
A better approach would have been to use the Unix philosophy as it stood: a
program to be run is just run, with execve()
, as-is; "do" scripts
are expected to all have #!
to identify their particular
interpreters; the system relies on the operating system kernel's processing of
#!
and doesn't roll its own one badly; and there is no
hardwiring of one particular shell.