How operating systems determine the location of the boot volume when bootstrapped

You've come to this page because you've asked a question similar to the following:

How does an operating system decide which volume is the "boot" volume (a.k.a. "boot partition", "root volume", "root partition", or "root filesystem") when it is bootstrapped?

This is the Frequently Given Answer to such questions. (How operating systems determine the location of the system volume is a different Frequently Given Answer, note.)

When an operating system is bootstrapped it must decide which volume is to be the boot volume — i.e. the volume wherein are found all of the operating system utilities, run-time configuration files, libraries, and system data. There are three basic ways in which operating systems do this:

The operating system requires that the system administrator explicitly specify the location of the boot volume using the standard firmware-supplied configuration settings that are provided to all operating systems for this purpose.
The operating system requires that the system administrator explicitly specify the location of the boot volume using some non-standard configuration settings that are specific to one particular operating system boot loader program.
The operating system defaults to abusing (or at least attempting to abuse) the system volume as the boot volume.

Using a standard firmware-supplied configuration setting

This is the approach to locating the boot volume that is employed by modern operating system designs. The operating system boot loader, that is invoked by the machine firmware, consults configuration information provided by the machine firmware that specifies the locations of the operating system kernel program image file and of the boot volume. The operation of locating the boot partition is thus a relatively simple one.

What precise form this configuration information takes depends from the machine firmware.

The configuration information is created and written by the operating system installation utility when the operating system is installed. Modern machine firmwares provide facilities to allow system administrators to edit the configuration information without needing to actually bootstrap an operating system in order to run a configuration editing utility.

Standard configuration settings supplied by Extensible Firmware Interface (EFI) firmwares

On machines with EFI firmwares, the location of the boot volume is determined by the value of a machine firmware variable that is stored in non-volatile RAM. Each entry on the EFI Boot Manager menu is defined by the value of a single NVRAM variable, named BootXXXX where XXXX is a number. Each variable's value comprises a rich binary data structure (the EFI_LOAD_OPTION structure) that comprises the whole definition.

This binary data structure contains, amongst other things, an array of EFI Device Paths. (An EFI Device Path is EFI firmware's standard general-purpose mechanism for specifying hardware devices, disc volumes, and files.) The first device path specifies the location of the operating system boot loader program image file itself. This is the program image file that the EFI Boot Manager loads and invokes when the entry is selected from the menu. The second and subsequent paths are for use by the operating system boot loader. They specify the device path to the kernel program image file, the device path of the boot volume, and so forth.

One thus specifies which volume is the boot volume by editing the appropriate device path in the boot menu entry. The operating system boot loader takes the device path and passes it to the operating system kernel, translating it if necessary into whatever internal naming scheme the operating system kernel itself uses for specifying disc volumes. The operating system kernel in turn mounts the designated volume as the boot volume.

Standard configuration settings supplied by Advanced RISC Computing (ARC) firmwares

ARC firmwares are very similar to EFI firmwares. Like EFI firmwares, they have a native general-purpose naming mechanism, ARC Paths, for specifying devices, disc volumes, and files. Like EFI firmwares, they have variables stored in non-volatile RAM that name the program image files of operating system boot loaders and that contain configuration information to pass to those boot loaders, such as the ARC Path of the boot volume.

The only significant difference is in the implementation detail. Whilst EFI firmwares store everything in the value of a single variable, ARC firmwares store the ARC Paths of the operating system boot loader, the kernel program image file that it loads, and the boot volume, in the values of three separate variables: OSLoader, OSLoadFilename, and OSLoadPartition, respectively. Thus the location of the boot volume is the value of the ARC firmware's OSLoadPartition variable, held in NVRAM.

As with EFI firmwares, the operating system boot loader has to translate the ARC Device Path for the boot volume into whatever naming scheme the operating system kernel itself uses for specifying disc volumes.

Windows NT needs no translation at all. The Windows NT kernel directly understands ARC Paths, and Windows NT uses ARC Paths as its mechanism for communicating between boot loader and kernel. NTLDR, the Windows NT operating system boot loader, simply passes the ARC Path taken from OSLoadPartition directly to the kernel, which in turn decodes the ARC Path to determine the actual volume that is the boot volume.

Standard configuration settings supplied by old PC/AT and PC98 firmwares

On machines with old PC/AT and PC98 firmwares, there is no service available from the firmware for specifying the location of the boot volume to the operating system boot loader. Operating systems thus simply pretend that they are using more capable firmwares, employing shim layers that sit between the operating system boot loader proper and the PC/AT or PC98 firmware.

This is the approach taken by Windows NT.

In Windows NT up to and including version 5.2, the boot loader and the kernel behave as if they are running on ARC firmware, and a shim layer is added to NTLDR to emulate the services provided by and the behaviour of such firmware. In particular, NTLDR contains a shim that presents a menu of boot options to the user (which the firmware does itself on real ARC systems) reading boot configuration data from the /boot.ini file on the system volume (rather than from NVRAM as on real ARC systems). NTLDR also contains a shim that switches the machine into protected mode before running the boot loader proper, and that provides disc volume and console I/O services in protected mode — which are provided by the firmwares themselves on ARC firmware and EFI firmware systems.

The /boot.ini file on the system volume provides only persistent storage for the boot configuration data that would be held in non-volatile RAM by ARC firmwares, rather than a full general-purpose NVRAM variable data storage service. Each line in /boot.ini comprises the ARC Path of the boot volume, the ARC Path of the kernel image directory, and the kernel command line options — exactly what would be stored in the OSLoadPartition, OSLoadFilename, and OSLoadOptions variables on an ARC firmware system.
In Windows NT from version 6.0 onwards, the boot loader and the kernel behave as if they are running on EFI firmware, and a shim layer is added to provide a Poor Man's emulation of the services provided by and the behaviour of such firmware. The Microsoft Boot Manager is a Poor Man's version of the EFI Boot Manager. It presents a menu of boot options to the user (which the firmware does itself on real EFI systems) reading boot configuration data from the /BCD file on the system volume (rather than from NVRAM as on real EFI systems). Microsoft's Boot Manager also contains a shim that switches the machine into protected mode before running the Windows boot loader proper (winload.exe).

The /BCD file on the system volume (strictly, on Microsoft's Poor Man's equivalent to the system volume for old PC/AT and PC98 systems: the System Reserved Partition) provides a little more of a general purpose data storage service than the /boot.ini database. "BCD" stands for Boot Configuration Database. But again, it is limited to the information that specifies what operating system loader programs to run, what kernel and HAL files they are to use, various other kernel options, and where the boot volume is — the very same information that would be stored in a BootXXXX NVRAM variable on a proper EFI firmware system.

The problems with this approach are akin to the problems with the roll-one's-own explicit configuration mechanism described next.

Because the firmware that the system is pretending to be there isn't actually there, but is merely a shim layer beneath the boot loader proper, some of the functionality of a system with the real firmware is lacking.

For example: The shim layer in NTLDR on IBM PC compatible firmware systems provides no maintenance utility that allows allows entries on the boot manager menu to be added, modified, and erased before bootstrapping an operating system. On an IBM PC compatible firmware system one is as a consequence faced with the chicken-and-egg problem of not being able to edit /boot.ini without being able to boot the operating system so that one can run an editing tool, and not being able to boot the operating system without adjusting the ARC Paths in /boot.ini to compensate for the changes in device names caused by moving discs around.

In contrast, on a real EFI firmware or ARC firmware system it is possible to adjust the device paths in the boot manager menu entries to cope with moving disc units from one place to another, or with changing their IDs, because the firmware itself comprises a built-in Boot Manager Maintenance utility for doing so.
Because the data storage provided by the emulated firmware is being supplied by the shim using a private data storage area, only tools that know the private data storage area's location and format can alter the boot configuration data.

In contrast, on a real EFI firmware or ARC firmware system, the locations and formats of the boot configuration data are standardized, and can be manipulated both by the Boot Manager Maintenance utility and by any tools for EFI/ARC firmware configuration on any operating system.

Rolling one's own explicit configuration mechanisms

Some operating systems make no attempt whatever to use the standard facilities that the machine firmware provides, but instead roll their own non-standard configuration mechanisms for explicitly specifying the location of the boot volume.

Linux is a major sinner in this regard. Linux kernels, for example, make no effort to use the kernel image file and root filesystem device path information that is supplied for the purpose by ARC firmwares and EFI firmwares. Instead, Linux kernels built without the CONFIG_EDD option have the location of the boot volume hardwired into the in-memory image of the operating system kernel itself, as a pair of major+minor device numbers. (Things get worse, not better, with the CONFIG_EDD option, which switches to a system of abusing (or at least attempting to abuse) the system volume as the boot volume, more on which later.)

This pair of values in the kernel image is set manually by the system administrator in one of two ways:

For kernel images that are written directly to a volume, sector-for-sector, the system administrator is required to run the rdev configuration utility after writing the kernel image to the volume. This writes a given major+minor device number pair to the correct place in the kernel image on the volume.
For kernel images that are written to a file on a volume, and loaded indirectly via a boot loader such as LILO or GRUB, the boot loader modifies the kernel image on the fly as it is loading it into memory, taking the major+minor device number pair to use from the boot loader's own configuration file (e.g. lilo.conf).

The problems of this approach are severalfold:

Changing disc IDs, moving disc units from one bus to another, or moving discs between machines, results in a chicken-and-egg situation: The operating system cannot be bootstrapped because the non-standard configuration information needs to be adjusted to reflect the altered disc unit number for the boot volume, but the configuration information cannot be updated because in order to run the configuration update tool one must first bootstrap the operating system.

Several Linux boot loaders allow the major+minor device number for the boot volume to be specified interactively at boot time. However, that does not wholly solve the chicken-and-egg problem. They require that the system administrator supply the major+minor device numbers that will be assigned to the boot volume. But those device numbers are not necessarily known by the system administrator until after the operating system has booted.

Changing disc IDs and moving discs are not the only ways to cause this problem. Creating or deleting partitions on the disc unit containing the boot volume creates the same chicken-and-egg situation: The operating system cannot be bootstrapped because the non-standard configuration information needs to be adjusted to reflect the altered partition number for the boot volume, but the configuration information cannot be updated because in order to run the configuration update tool one must first bootstrap the operating system.

Creating or deleting a partition changes the assignments of minor device numbers to disc partitions. So whenever a disc partition is created or deleted (on the disc unit containing the boot volume), the kernel image's major+minor device number pair must be manually updated by the system administrator (either directly with rdev or indirectly by re-running LILO's or GRUB's configuration writing utility), otherwise the kernel will use the wrong disc partition as the boot volume.

In contrast, the boot configuration data on EFI firmware and ARC firmware systems have standardized locations and formats, and can be edited when such chicken-and-egg situations occur using either the maintenance utility that is built in to the firmware itself or any EFI/ARC firmware configuration tool from any operating system.
The volume naming scheme is non-standard and peculiar to each operating system. Where multiple such operating systems are involved, the multiplicity of naming schemes can easily become confusing. Further confusion can result when boot loaders introduce their own naming schemes, too.

For example: Whilst to the Linux kernel proper, the boot partition may be known as "major device number 8, minor device number 1", Linux system administration utilities and Linux boot loaders all use "user-friendly" names instead. Whilst supposedly relieving the system administrator of the burden of remembering the actual device numbers by introducing an additional layer of indirection, they introduce problems by dint of the fact that there are at least two different schemes for such "user friendly" names.

Whilst the device file /dev/sda1 may represent that volume in the filesystem once the operating system is running, and be the way to refer to it with system administration commands such as dd, to the GRUB boot loader it is named (hd0,0). To further add to the confusion, other operating systems may use user-friendly names for the volume that are similar but not quite the same, such as /dev/sd0s1.

In contrast, ARC Paths and EFI Device Paths are standardized parts of the machine firmware and are the same across all operating systems.

Abusing the system volume for the boot volume

Some old operating systems simply do not separate the concepts of "system volume" and "boot volume" at all. They obtain the location of the system volume and they use that as the location of the boot volume. Other operating systems do have separate concepts of system and boot volumes, but simply abuse the former for the latter.

This approach is rarely used on EFI firmware or ARC firmware systems, simply because those systems provide well-defined and simple mechanisms for specifying the location of the boot volume, making it daft to do anything but use those mechanisms.

This approach is commonly used on PC/AT and PC98 compatible firmware, however. Such firmwares provide no services to operating system boot loaders for locating the boot volume, but they do provide services for locating the system volume. Such services are an implicit part of the bootstrap process, which passes the firmware disc unit number of the system volume in a documented CPU register, and the start and length of the system partition on that disc in undocumented fields of the BIOS Parameter Block to the operating system boot loader, which in turn passes them on to the operating system kernel.

The major problem with this mechanism for determining the boot volume is that it requires that the operating system's idea of disc unit numbers exactly matches the machine firmware's idea of disc unit numbers. Thus either the operating system must use the machine firmware for all disc I/O (as operating systems such as PC-DOS, MS-DOS, DR-DOS, and FreeDOS all do), or the operating system's own disc device drivers must exactly mirror the machine firmware's disc device drivers. In particular:

The operating system must not have device drivers for disc units that the machine firmware does not understand (unless they are assigned disc unit numbers that are unused by the machine firmware).

For example: If the machine firmware only understands how to access ATA discs, but the operating system has disc device drivers for both ATA and SCSI discs, the device driver initialization order must be such that the operating system assigns the same disc unit numbers to the ATA discs that the machine firmware does. If the machine firmware assigns the first disc unit number to an ATA disc, but the operating system disc driver initialization order causes the first disc unit number to be assigned to a SCSI disc, the operating system will attempt to read the boot volume from entirely the wrong disc unit.
Even if the operating system and the machine firmware understand the same kinds of disc, the operating system's device drivers must initialize in the same order that the machine firmware's device drivers assign disc unit numbers.

For example: Many ROM firmware extensions for SCSI Host Adapter cards have the configurable option of assigning disc unit numbers to SCSI hard discs either ahead of or following the unit numbers assigned to ATA hard discs. Therefore whenever this option is changed, the operating system's SCSI and ATA device drivers must be re-ordered to initialize in the same order that the machine firmware has been configured.

This scheme does not work at all for operating systems that simply do not employ the same disc unit numbering system as the machine's firmware does, but have a very different native disc device naming scheme. This is the case for Linux, for example. There is no straightforward mapping between IBM PC firmware disc unit numbers and the major+minor device number pairs that Linux uses to identify disc volumes.

In order to determine which volume known to the operating system kernel corresponds to the disc unit number supplied by the machine firmware, such operating systems employ a bodge: The operating system boot loader reads data from the system volume, via the machine firmware, and then the operating system kernel reads data from each volume in turn, via its own device drivers, until it finds a volume with matching data. This volume is then designated the boot volume.

This bodge is the mechanism that is employed by Linux kernels built with the CONFIG_EDD option.

It is a problematic mechanism for several reasons:

The most obvious implementation of this mechanism is for the boot loader to read the Partition GUID from the system volume's partition table entry, and then for the operating system kernel to look for a volume with that Partition GUID. Partition GUIDs are intended to uniquely identify a partition.

However, this requires that the disc containing the boot/system volume be one that is partitioned using the GUID Partition Table scheme . The MBR Partition Table scheme does not have unique identifiers for partitions. Thus boot/system volumes cannot reside on discs partitioned with the MBR Partition Table scheme, or on non-partitioned discs such as floppy discs.

Furthermore, this implementation requires that the kernel itself understand both the GUID and the MBR partitioning schemes, since it has to read every partition table entry on every disc unit in the system until it finds one with a matching signature. Whilst many operating system kernels implement partition table handling in kernel space, not all do; and for those that do not a mechanism that requires that the kernel understand partition table layouts is not satisfactory.
The second choice for implementing this mechanism, which is the one used by Linux kernels built with the CONFIG_EDD option, is to identify the disc unit containing the boot/system volume using this approach, but for the kernel to identify the actual partition on the disc for the volume using the start and length information supplied by the machine firmware (and passed along by the boot loader).

However, only discs partitioned with the GUID Partition Table scheme have a Disc GUID that uniquely identifies the disc. Discs partitioned with the MBR Partition Table scheme may have a 32-bit disc signature, but only a few disc management tools (such as Microsoft's Disk Administrator) actually attempt to assign signatures to discs at all. Most such tools do not. Furthermore, non-partitioned discs have no signature fields at all. Thus boot/system volumes can only reside on discs partitioned with the MBR Partition Table scheme if those discs have been given signatures by Microsoft's tools, and cannot reside on non-partitioned discs such as floppy discs at all.
The third choice for implementing this mechanism is to read disc sectors 0 and 1 and compute a signature from their contents using an algorithm such as MD5.

However, this mechanism breaks because the signature of the disc unit changes if either sector changes in any way. Writing new MBR code will change the disc unit's signature, for example. So too will creating and deleting primary partitions.

Furthermore, choosing a more limited set of data from which to calculate the signature, in order to prevent signatures from changing too readily, in its turn risks duplicate signatures. If just the contents of sector 0 are used to calculate the signature, for example, then many GUID Partition Table discs of the same size will have identical disc signatures, because they all contain nothing but a dummy MBR Partition Table and dummy boot loader code in sector 0.
Whatever disc signature is used, this mechanism requires that the kernel at the very least read from every single disc unit in the system at system bootstrap.

© Copyright 2006,2011 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.