The known problems with System 5 `rc`

It is often said that System 5 rc is a "traditional" Unix and Linux system. It is not. As the name says, it only dates back to the release of AT&T Unix System 5 in the 1980s. Indeed, what most people think of as System 5 rc isn't even that. The Linux subsystem by that name is a clone, written by Miquel van Smoorenburg in 1992, not the Unix original.

A traditional system is the old /etc/rc as a single shell script, that sourced a couple of others. This dates back to 1979 and Unix Version 7, and continued in use until 2011 when its remaining mainstream user OpenBSD finally switched to a replacement in version 5.0.

Of course, the original system in 1st Edition Unix didn't have rc at all, hardwiring everything into the code of init (as restored and made available by Warren Toomey et al.); including respawning 12 getty processes, 3 hardwired filesystem mounts, and running a program from the home directory of a user named mel.

You've come to this page as a result of a question similar to the following:

What are the known problems with System 5 rc? I've not heard of any until now.

This is the Frequently Given Answer to that question.

You're almost three decades late.

The problems with System 5 rc were remarked upon in its first decade, not least from the BSD world which resisted adopting the idea. Many replacements for it have come along in the decades since, and one can see from their designs the perceived flaws that they attempted to tackle. IBM AIX had the System Resource Controller (SRC) in 1990 with AIX version 3.1, which ran under System 5 init but replaced System 5 rc (Don't get the twain mixed up.) with a service manager program (srcmstr) that handled starting/stopping/supervising system dæmons. Daniel J. Bernstein released daemontools in 1997, which as the name suggests was a toolset for dæmon supervision that boiled the actual dæmons down to very simple run programs with little (and sometimes no) shell programming code in them at all. The Linux file-rc, which replaced System 5 rc and its system of numbered symbolic links in numbered subdirectories with a single database file named runlevel.conf that one (carefully) edited in the same manner as (say) /etc/passwd, came along the same year.

In 1999, Luke Mewburn worked on replacing the /etc/rc system in NetBSD. netbsd.tech.userlevel mailing list discussions from the time show several criticisms of the System 5 rc and System 5 init systems, and encouragement not to repeat their mistakes in the BSD world. The resultant rc.d system was roughly contemporary with Daniel Robbins producing OpenRC, another System 5 rc replacement that replaced the (Bourne/Bourne Again) shell with a different script interpreter, nowadays named /sbin/openrc, that provided a whole lot of standard service management functionality as pre-supplied functions. The NetBSD rc.d system likewise reduced rc.d scripts to a few variable assignments and function calls (in about two thirds of cases).

M. Mewburn was one of a few people who actually wrote papers on the subject explaining the problems of existing systems that he was attempting to address. Others were the designers of Solaris' Service Management Facility (SMF), Richard Gooch, and (to a far lesser extent because they left as an exercise to the reader figuring out why the use cases that they laid out were actually problems for System 5 rc) the authors of upstart. These are highly recommended reading.

Luke Mewburn (2001). The Design and Implementation of the NetBSD rc.d system (Author's copy) Proceedings of the 2001 Annual Technical Conference. Usenix. pp. 69–80.
Richard Gooch (2002-11-21). Linux Boot Scripts. safe-mbox.com.
Jonathan Adams, David Bustos, Stephen Hahn, David Powell, and Liane Praza (2005). Solaris Service Management Facility: Modern System Startup and Administration. Proceedings of the 19th Large Installation System Administration Conference (LISA ’05). Usenix. pp. 225–236.
Scott James Remnant, Erik Troan, Thomas Hood, et al. (2006-06-23). ReplacementInit. Ubuntu wiki.

Here are some of the salient points.

The mapping between services and `init.d` scripts is not 1:1.

The old joke "fork Fedora" WWW page made an apples-to-oranges comparison between a systemd service unit and a System 5 init.d script. The problem with the comparison exemplifies the point at hand, here. The init.d script actually starts and stops two Sendmail services, not one.

This is one of the problems with System 5 rc: The individual init.d scripts do not necessarily have a 1:1 correspondence with the services being managed. This has various ramifications:

Determining what script controls what service isn't as straightforward as just listing directories.
With multiple-service scripts, it is impossible to control the individual services independently.
Even just getting a list of the enabled services, let alone which ones are running, is non-trivial.
Partial failure to start in multi-service scripts can sometimes jam the service management.

There is excessive coupling between services that notionally could operate in parallel.

Development of tools such as startpar attempted to address this within the Linux System 5 rc clone. Many of the other systems addressed it as well. The original subsystem proceeded serially, executing init.d scripts one after another. This fails to take advantage of one of the core features of a multi-process multi-tasking operating system.

There are no dependency and ordering capabilities.

[insserv] was introduced to correct broken boot and shutdown ordering using a system wide reordering of every scripts at once. This was needed as it proved impossible to get every package maintainer involved in a boot sequence reordering to lift together in a finite amount of time, where the last one in a sequence had to change its number, before the second to last changed its number, and so on and so forth. There were lots of incorrect sequence numbers in the boot and shutdown sequence before we introduced dependency based boot ordering, and this was solved when we introduced dependency based boot ordering.

— Petter Reinholdtsen, sa6y3azp408.fsf@meta.reinholdtsen.name, 2018-10-15

Again, this is something that tools such as insserv attempted to address within the Linux System 5 rc clone, calculating a statically configured startup order whenever a service was added or removed. The NetBSD rc.d subsystem likewise has a tool named rcorder which recalculates a total script execution order at each bootstrap.

The original subsystem had no notion of one service requiring the operation of another, and thus depending from it being started first and stopped last. It had no notion of the start/stop of one service implying the start/stop of other necessary/conflicting services.

Services had ordinal numbers for their start and kill scripts, but as Debian found out this did not scale at all. There was no overall system for assigning such numbers; they were just picked largely at whim by whoever it was that wrote the rc script. With a large number of independently developed softwares, the problems of manually assigning an ordering, and then changing it later, became intractable.

There is excessive coupling between scripts.

There is a non-trivial set of init.d scripts (and to a lesser extent rc.d scripts on the NetBSD/FreeBSD systems) which are either used by other scripts internally, or designed to be used by the system administrator as Swiss Army Knives that do things other than actual service management. The former internal interfaces make things opaque and fragile. The latter tend to complicate the scripts and hide the actual service administration parts.

Networking and peripheral device management scripts are some of the sinners in this regard. But there are all sorts of cases in odd corners to be discovered. As noted, this problem is not limited to solely the System 5 rc world; witness things such as such as the FreeBSD remote filesystem mounting calling into the "cleanvar" service, and FreeBSD's power_profile and serial scripts, neither of which actually do any service mangement and implement no bootstrap start/stop mechanisms at all.

The system relies upon the dæmonization fallacy for manual dæmon start/stop.

There is a fallacy that holds that manual service control is clean because one can safely, cleanly, and securely "dæmonize" from an interactive login session; one cannot, it isn't, and system administrators tell war stories about the results. The System 5 rc mechanism for manually starting and stopping services on a running system is mis-designed based upon this fallacy.

Dæmons are not supervised.

The System 5 rc system is an entirely passive system. Pace the considerations of the dæmonization fallacy, where an interactive login session could have set up an alternative child process "subreaper" (creating yet another way in which spawning services from such a session differs unexpectedly from spawning them at bootstrap), running services have process #1 as their parent process. Process #1 has no knowledge of what the individual dæmon processes are. Nor would a subreaper have. Nothing that explicitly knows about the services is informed when they die from crashes, or just exit.

The IBM AIX SRC not only monitored the state of service processes, but had kernel extensions peculiar to AIX that enabled the srcmstr process to recognize services that were not its own children but that had been started by an earlier incarnation of srcmstr. daemontools famously brought into the hobbyist mainstream the idea that dæmons could be auto-restarted after they had crashed/exited. This idea was extended by other members of the daemontools family, with toolsets like nosh, runit, and perp adding "restart" configuration mechanisms for fine-grained control of if, when, and how often services get restarted when they crash or exit.

There is no standard.

POSIX famously avoided standardizing anything to do with superuser-level system administration, system startup, and system shutdown. The System V Interface Definition also does not cover either init (the program for process #1, not the utility command for communicating with that program) or System 5 rc.

It was in 1998 that it first came to light that this lack of any reference standard had grown a mess. Roland Rosenfeld and Martin Schulze, who had derived file-rc from a system named r2d2 written by Winfried Trümper, together with Miquel van Smoorenburg who wrote the Linux System 5 rc clone, discovered that r2d2, file-rc, and the System 5 rc clone all had subtly different ideas about when and on what scripts to run stop and start actions and there was a counterintuitive swap of "stop" and "start" specific to two runlevels. The only reference doco that anyone could point to was the Debian Policy Manual, which of course had been written based upon the behaviour of the Linux System 5 clone in the first place.

Another effect of the lack of standardization is that there was no agreement as to what the run-levels, that controlled which set of scripts the System 5 rc system executed, actually were. In some Linux operating systems, there were distinct run levels for single-user mode, plain multi-user mode, "server" multi-user mode, and multi-user mode with everything (servers and graphical UI). In other Linux operating systems, one or other of "server" multi-user mode and plain multi-user mode were absent. In yet others, and in some of the proprietary System 5 based Unices, yet further variations existed; such as different run levels for different graphical UIs and variations on single-user mode. IBM AIX replaced System 5 rc but generalized System 5 init, going so far as to provide 6 extra run levels (7 to 9 and a to c in AIX 7.1). At the same time, though, it gave only one run level a defined cross-system meaning: level 2. Of course, there was also disagreement amongst all of these as to which mode was assigned which number or letter. The upshot was that system administrators had no portable idea of what (for example) the init 2 command would cause in terms of System 5 rc actions.

A cornucopia of bugs

Most of the aforementioned problems are design problems; now to the implementation problems. In theory, System 5 rc scripts are "just shell", and easy to write and to maintain. The evidence of nigh on three decades' worth of history shows that they're actually quite difficult to implement right.

System 5 rc scripts have a specific interface to obey. But it has several subtleties, which authors are often unaware of or get wrong even if they are aware of them:

The "LSB headers", information about script startup and ordering placed in comments in the starts of scripts, are nowadays vital for many Linux operating systems. But there are varying degrees of necessity; some Linux operating systems requiring more header information than others. Moreover, these headers were not parts of the original subsystem, but later additions to enable things such as service ordering and letting the system know automatically which runlevels a service should operate in. So a lot of scripts in the wild are in varying degrees of achieving conformance with (comparatively) recent requirements added to the system over the past couple of decades.
The result code of the script actually has a defined meaning on several Linux operating systems. Just letting the script return the exit status of the last command that happens to run is frequently not the right thing.

System 5 rc scripts have a specific job to do when managing dæmons; including ensuring that the right processes are killed, ensuring that at most one instance of a service runs at any time, and writing little coloured "[OK]" messages to a terminal. But it's one that's hard to write and to do in shell script.

Linux operating systems offer various helper programs and shell function libraries for handling the mechanics of much of this. But the helpers vary wildly from system to system. On CentOS one had status and killproc; on Debian one had start-stop-daemon and status_of_proc; in the Linux From Scratch world one had statusproc. Many widely ported applications have to come with three or more sets of script files, just to account for these difference in helper program names, and slight but important differences in their functionality, across Linux operating system families.

Those that attempted to do grand, all Linuxes in one, scripts ended up with scripts where practically every step of the script is a case "$system" in redhat) … suse) … debian) … gentoo) … lfs) … esac block. Needless to say, these are maintenance nightmares.
PID files are inherently broken as a means of tracking running processes by their natures.
Some people try to create "subsystem lock files" in subdirectories of /var/lock (a.k.a. /run/lock), with all of the problems that ensue from things like stale lockfiles and read-only/unmounted filesystems.

The known problems with System 5 rc

The mapping between services and init.d scripts is not 1:1.