This is the documentation for a filesystem format known as "HPFS2". This filesystem format is intended to be public and free.
There is currently no filesystem format that both contains the extensive repertoire of features required by modern operating systems and is free of patent and trade secret encumbrances. FAT is encumbered by patents held by IBM and by Microsoft. Microsoft does not document NTFS at all. Neither IBM nor Microsoft document HPFS, and it suffers from a few unfortunate misfeatures, such as holding metadata in directory entries that should properly be held in f-nodes. EXT2/EXT3 still retains some of the more clunky warts of Unix filesystem formats, such as block maps and fixed-position fixed-length inode tables. And LEAN fails to learn from the design mistakes of the past.note
The basic data structure in HPFS is the FileNode. FileNodes have attributes (such as a last modification date) and indexes.
Indexes map keys to 64-bit numbers. There are two defined types of index in HPFS2.
The first type of index is a file-sector index. The key in a file-sector index is a simple integer, conceptually a file-relative sector number. The value that it maps to is a volume-relative sector number. File-sector indexes are thus used by FileNodes that correspond to ordinary data files. The index maps from the sector offset within the file to the actual sector in the volume that holds the file data.
The second type of index is a name index. The key in a name index is a Unicode string, conceptually a name of a directory entry. The value that it maps to is the number of an entry in the FileNode table. Name indexes are thus used by FileNodes that correspond to directories. The index maps from the name of the drectory entry to the FileNode for the directory.
Everything in HPFS2 is organized using a FileNode. The free space file is organized as a FileNode that has a file-sector index, with the data sectors that it points to comprising the free space. The bad sector list is another FileNode with a file-sector index, with the data sectors that it points to being the bad sectors on the volume. Even the FileNode table is organized as a FileNode, with the data sectors pointed to by its file-sector index being the FileNodes themselves.
HPFS2 allocates space in sectors. No specific sector size is required, but HPFS2 is discouraged on devices where the sector size is less than 512 bytes. Furthermore: HPFS2 should not be used on devices where the sector size is less than the sizes of the headers of any of the whole-sector data structures.
There are three main whole-sector data structures in HPFS2: the FileNode, the FSIndexNode, and the NameIndexNode. Each comprises an integral number of disc sectors, and each begins with a 64-bit magic number. The purpose of this is to allow volume repair utilities in the worst case, when no other form of recovery is possible, to scan the entire volume, sector by sector, and to rebuild the volume from all of the nodes that it finds.
HPFS2 does not have the concept of clusters. Clusters are used for two purposes:note reducing the number of bits used in address pointers and grouping I/O transactions into units larger than 1 sector. Neither of these apply to HPFS2. HPFS2 employs an extent-based allocation scheme, and encourages the use of an allocation policy that minimizes the number of extents required by a file. An individual address pointer is not required for each individual sector in a file, so the width of address pointers is not a pressing concern. Filesystem drivers that wish to cluster individual sector reads and writes can read and write entire extents (or fractions thereof) if they so choose, so I/O transactions can be grouped as actually appropriate, instead of only into a fixed cluster size.
The one fixed-position data structure in HPFS2 is the volume boot record, located in sector 0, the first sector of the volume. The VBR contains a BIOS parameter block, and optionally also contains boot code. The operation and the location of the boot code is operating-system specific and architecture-specific, and beyond the scope of this specification.
Volume format utilities are responsible for initializing the BPBs in HPFS2 VBRs. They may employ version 4.0 BPBs, version 7.0 BPBs, or version 7.0bis BPBs. (Version 7.0bis BPBs are recommended, with version 7.0 BPBs being a second preference and version 4.0 BPBs being a third.) They may not, however, employ Windows NT BPBs, or plain version 3.4 BPB, because such BPBs contain no filesystem type information and it is thus impossible to reliably distguish HPFS2 format volumes from NTFS format volumes and FAT format volumes.
Volume boot code and filesystem drivers must detect the type of BPB that is in
use by inspecting the BPB signature and filesystem type fields. The
filesystem type for HPFS2 is "HPFS2
".
HPFS2 volume boot code may employ only the following BPB fields:
Field | Notes |
---|---|
sector size | only to be used if the machine firmware does not supply the actual sector size |
sectors per track | only to be used if the machine firmware does not provide a logical block DASD I/O API |
number of heads | only to be used if the machine firmware does not provide a logical block DASD I/O API |
number of "hidden" sectors | to translate volume-relative sector numbers into disc-relative sector numbers |
signature bytes and filesystem type fields | |
root directory start pointer (version 7.0 BPBs only) | This contains the volume-relative sector number of the Master FileNode. |
superblock pointer (version 7.0bis BPBs only) | This contains the volume-relative sector number of the Master FileNode. |
In particular, volume boot code may not use the value in the BPB drive number field, but must instead use the drive number that the machine firmware supplies to it.
Filesystem drivers and filesystem maintenance utilities may employ only the following BPB fields:
Field | Notes |
---|---|
signature bytes and filesystem type fields | |
root directory start pointer (version 7.0 BPBs only) | This contains the volume-relative sector number of the Master FileNode. |
superblock pointer (version 7.0bis BPBs only) | This contains the volume-relative sector number of the Master FileNode. |
"chkdsk" flags | |
volume label | |
volume serial number |
In particular, filesystem drivers may not use the value in the BPB sector size field, or the BPB fields used for LBA→CHS translation and for volume-relative→disc-relative translation. Operating systems are required to provide a logical block I/O API to filesystem drivers, to provide filesystem drivers with the actual block size of the device, and to translate volume-relative sector numbers into disc-relative sector numbers. (Filesystem maintenance utilities that run directly on top of the machine firmware, and that do not employ an operating system, must likewise use equivalent facilities provided by the machine firmware.)
The following BPB fields must be ignored by HPFS2 boot code and filesystem drivers, but are required to contain specific values for the benefits of poorly written tools that believe that only the FAT filesystem format exists:
Field | Notes |
---|---|
number of sectors per allocation unit | must be 1 |
number of "reserved" sectors | must be at least 1 |
number of FATs | must be 0 |
sectors per FAT | both the 16-bit and 32-bit fields must be 0 |
number of records in the root directory | must be 0 |
All other BPB fields are ignored and should be initialized to 0.
If neither a version 7.0bis nor a version 7.0 BPB is employed, volume boot code and filesytem drivers must assume a fixed-position Master FileNode located at sector number 64 of the volume. It is strongly recommended that filesystem format utilities use version 7.0 or version 7.0bis BPBs, however.
Neither volume boot code nor filesystem drivers may use any of the three sectors-in-volume fields. The free space file is the sole determiner of what usable sectors exist in a volume. Filesystem repair utilities should employ external means, outside of the on-volume data structures, for determining the number of sectors in the volume.
FileNodes comprise a fixed header and a variable metadata area. All FileNodes in a volume are the same size, which is an integral multiple (usually 1) of the volume sector size. The sizes of FileNodes are determined at format time. Format utilities are required to ensure that FileNodes are at least 512 bytes long.
Offset (octets) | Description | ||||||
---|---|---|---|---|---|---|---|
0x0000 | 64-bit magic number | ||||||
0x0008 |
Flags word
|
||||||
0x000A |
16-bit offset to first metadata record
The offset is from the start of the FileNode, and is essentially the length of the FileNode header. This is intended to allow the FileNode header structure to be expanded in future revisions of the file format whilst preserving backwards compatibility. This must be a multiple of 4 octets. |
||||||
0x000C |
16-bit offset to first free metadata record The offset is from the start of the FileNode. All space in the metadata area from this point up until the end of the FileNode is available for metadata records. All space from the first metadata record up to but not including this point is used by metadata records. Filesystem repair utilities must ensure that this does not exceed the FileNode length. |
||||||
0x000E | 2 octets reservednote (present for the purpose of aligning fields with natural word boundaries) |
The actual metadata in a FileNode are held in metadata records. Metadata records are variant records that begin with a type and a length field. A FileNode may contain at most one metadata record of any given type. If a FileNode contains more than one metadata record of any given type, filesystem drivers should use the first record, and filesystem repair utilities should discard the second and subsequent records.
Not all FileNodes will have metadata records of all types. The FileNode for the free space file need not have unnamed or named attributes, for example.
Metadata records of different types are not mutually exclusive. A FileNode is permitted to have all three of a name index, a file-sector index, and a symblic link metadata record, for example.
Filesystem drivers must preserve unaltered any metadata records of unknown type.
Offset (octets) | Description | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0x0000 |
8-bit metadata record type |
||||||||||||||||||||
0x0001 | Reservednote (present for the purpose of aligning fields with natural word boundaries) | ||||||||||||||||||||
0x0002 |
16-bit offset to the next metadata record
This is the offset from the start of this record, and so is the length of this record. When reading a record, its length may be shorter than the defined number of fields for a metadata record of the given type. (This is the case where the creator of the record uses a version of this filesystem format specification that is older than the one used by the reader of the record.) In this case, filesystem drivers and maintenance utilities are required to act as if all of the missing octets contain the value 0. When reading a record, its length may be greater than the defined number of fields for a metadata record of the given type. (This is the case where the creator of the record uses a version of this filesystem format specification that is newer than the one used by the reader of the record.) In this case, filesystem drivers and maintenance utilities are required to treat all additional octets, beyond the fields that it knows about, as reserved.note |
The unnamed attributes comprise the conventional, fixed length, unnamed file/directory attributes required by operating systems such as OS/2 and Unix. If a FileNode lacks this metadata record, filesystem drivers must report to operating systems as if the record were present and all fields contained the value zero. (Several of the system FileNodes normally lack this metadata record, for example. But, on the gripping hand, they are not normally accessible directly by operating system APIs.)
Several operating systems include other values in the set of unnamed attributes that they themselves ascribe to files/directories. Filesystem drivers do not derive these from the unnamed attributes record, but from other metadata records. The allocated space value is derived from a FileNode's file-sector index. The EA length value is derived from a FileNode's named attributes record.
Offset (octets) | Description | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0x0004 |
16-bit DOS type and flags This is identical to the OS/2 and MS/PC-DOS attribute word, except for its omission of the 'D' and 'V' attributes. The DOS/Windows/OS/2 file flags presented by filesystem drivers to operating systems are a combination of this field and (read-only) values calculated from other metadata. Specifically:
The Windows "Temporary", "Sparse", "Compressed", and "Encrypted" flags have no on-disc equivalents. Sparse file support is always enabled for all files in HPFS2. Whether a file is temporary or not has no relevance to the on-disc data structures. Compression and encryption will be controlled by other metadata records in future versions of this specification. Filesystem drivers for operating systems that do not have the concept of file flags should initialize this field to 0x0000 when creating new unnamed attributes records, and preserve the current value when modifying existing records. |
||||||||||||||||||||
0x0006 |
16-bit POSIX permissions Filesystem drivers for operating systems that do not have the concept of POSIX permissions should initialize this field to either 0700 or 0500 when creating new unnamed attributes records, and preserve the current value when modifying existing records. (If the operating system supports OS/2/Windows/DOS file flags, 0500 should be used if the 'R' bit in the creation file flags is set to 1, and 0700 if it is set to 0.) |
||||||||||||||||||||
0x0008 |
8-bit POSIX type The POSIX type presented by filesystem drivers to operating systems is a combination of this field and other metadata records. A file-sector index record implies a regular file. A name index implies a directory. And a symbolic link metadata record implies a symbolic link. Those records take precedence over this field. Only if a FileNode does not possess any of those metadata records should the value in this field be consulted. Type 0x00 may thus seem redundant. It is not. Its purpose is to force a FileNode to be reported as a regular file, even if that FileNode lacks all of the aforementioned metadata records. Filesystem drivers for operating systems that do not have the concept of POSIX file types should initialize this field to 0x00 when creating new unnamed attributes records, and preserve the current value when modifying existing records.
|
||||||||||||||||||||
0x000A | 6 octets reservednote (present for the purpose of aligning fields with natural word boundaries) | ||||||||||||||||||||
0x0010 | 64-bit creation timestamp, expressed as 100ns intervals since the Windows NT Epoch | ||||||||||||||||||||
0x0018 | 64-bit last modification timestamp, expressed as 100ns intervals since the Windows NT Epoch | ||||||||||||||||||||
0x0020 | 64-bit last access timestamp, expressed as 100ns intervals since the Windows NT Epoch | ||||||||||||||||||||
0x0028 | 64-bit last metadata change timestamp, expressed as 100ns intervals since the Windows NT Epoch | ||||||||||||||||||||
0x0030 | 64-bit end of file position |
The named attributes comprise up to 64KiB of arbitrary name and value pairs, where both name and value are strings of uninterpreted octets, and where no more than one pair may have any given name. Filesystem drivers for operating systems (such as OS/2 and Windows NT) that employ the concept of extended attributes store those attributes in this metadata record.
Offset (octets) | Description |
---|---|
0x0004 |
File-sector index metadata records are pointers to B+-trees that map file-relative sector numbers to a volume-relative sector numbers. On operating systems that can only treat filesystem objects as files, directories, or symbolic links, and not as a combination of two or more, a file-sector index takes precedence over a name index and a symbolic link metadata record. Any FileNode with a file-sector index is reported by filesystem drivers as a file to such operating systems.
Offset (octets) | Description |
---|---|
0x0004 | 4 octets reservednote (present for the purpose of aligning fields with natural word boundaries) |
0x0008 | 64-bit sector number of the FSIndexNode that comprises the root of the B+-tree |
Name index metadata records are pointers to B-trees that map names to 64-bit numbers. On operating systems that can only treat filesystem objects as files, directories, or symbolic links, and not as a combination of two or more, a name index takes precedence over a symbolic link metadata record but is subordinate to a file-sector index. Any FileNode with no file-sector index but with a name index is reported by filesystem drivers as a directory to such operating systems.
Offset (octets) | Description |
---|---|
0x0004 | 4 octets reservednote (present for the purpose of aligning fields with natural word boundaries) |
0x0008 | 64-bit sector number of the NameIndexNode that comprises the root of the B-tree |
Ownership metadata records comprise ownership information that is reported to Windows NT, Unix, and other similar operating systems.
Offset (octets) | Description |
---|---|
0x0004 |
12 octets reservednote (present for the purpose of aligning fields with natural word boundaries) |
0x0010 |
128-bit security identifier representing the object owner's primary user ID |
0x0020 |
128-bit security identifier representing the object owner's primary group ID |
Access control metadata records comprise the access control lists that are reported to Windows NT, Unix, and other similar operating systems.
Offset (octets) | Description |
---|---|
0x0004 | 4 octets reservednote (present for the purpose of aligning fields with natural word boundaries) |
0x0008 | 64-bit sector number of the first ACLNode in the access control list |
Symbolic link metadata records comprise symbolic link data that are reported to Unix and other similar operating systems as symbolic link data, and to Windows NT and other similar operating systems as reparse point data.
Offset (octets) | Description |
---|---|
0x0004 |
Name data for the link target Name data are stored as UTF-16 strings and are not NUL-terminated. (The length is derived from the metadata record length.) There are no restrictions imposed by the on-disc data structures as to what characters are legal in symbolic link target names. Such restrictions are imposed by individual operating systems. |
The Master FileNode metadata record has no meaning for any other FileNode apart from the Master FileNode.
Offset (octets) | Description |
---|---|
0x0004 |
16-bit number of sectors in a FileNode |
0x0008 |
16-bit number of sectors in a NameIndexNode |
0x000C |
16-bit number of sectors in a FSIndexNode |
Special requirements are placed upon the location of the Master FileNode metadata record within the Master FileNode itself. Because volume boot code, filesystem drivers, and filesystem maintenance utilities do not know the size of FileNodes until they have read this metadata record from the Master FileNode, the Master FileNode metadata record must be the first metadata record in the node, ensuring that it occurs within the first sector of the FileNode. Thus code can read the first sector of the Master FileNode, determine the length of FileNodes from the metadata record, and then proceed to read the rest of the Master FileNode.
Filesystem format utilities must choose sectors-per-node values that ensure that FileNodes, FSIndexNodes, and NameIndexNodes are at least 512 octets long. The recommended sizes are 1 sector for FileNodes and FSIndexNodes, and 4 sectors for NameIndexNodes.
Name indexes are B-tree structures. Each node in the tree is stored on disc as a NameIndexNode structure, which comprises an integral number of sectors. The root node of a tree is pointed to by a name index metadata record in a FileNode.
A NameIndexNode structure comprises a fixed-length header followed by zero or more name index entries.
Offset (octets) | Description | ||||||
---|---|---|---|---|---|---|---|
0x0000 |
16-bit length The length is constrained by the end of the containing Name Index Node. |
||||||
0x0002 |
16-bit flags word |
||||||
0x0004 |
64-bit FileNode table index |
||||||
0x000C |
64-bit downlink to a child name index node |
||||||
0x0014 |
Name data Name data are stored as UTF-16 strings and are not NUL-terminated. There are no restrictions imposed by the on-disc data structures as to what characters are legal in directory entry names. Such restrictions are imposed by individual operating systems. Comparison for B-tree ordering is case-sensitive, and the original case of an entry is preserved. (How operating systems that do not support case-sensitive filename matching implement case-insensitive lookup semantics depands from each individual operating system. One approach is, after checking for an exact match, to generate both all-uppercase and all-lowercase versions of a name and to scan all directory entries that are lexically between the two.) No provision is made for separate 8.3 format ("short") names for directory entries. Where an operating system employs the concept of "short" names (which in practice only applies to OS/2's VDM support and to Windows NT) a filesystem driver can can adopt one of two strategies:
Filesystem drivers for operating systems that do not have a "long"/"short" name dichotomy, such as Unix and TAU, need take no special measures. The zero-length string, which sorts lexically before all other entries, is a special entry that denotes what filesystem drivers should present as "." and ".." entries to operating systems. The contents of these entries are not stored in the on-disc data structures, since they are redundant. The FileNode table index for such an entry is the FileNode of the parent directory. |
All FileNodes in a volume are organized into a table, the FileNode table. The FileNode pointers in directory entries (i.e. the 64-bit values in name indexes) are indices into this table.
The FileNode table is itself organized via a FileNode: the Master FileNode. The FileNode table is treated as a regular file. To locate a FileNode in the FileNode table from its table index, one multiplies the index value by the FileNode size in sectors to obtain a file-relative sector number, and then looks up the sector(s) comprising the target FileNode using the file-sector index in the Master FileNode.
When a volume is formatted, the format utility allocates space in the FileNode table for a handful of default FileNodes. Other FileNodes are added to the FileNode table in the same way that sectors are added to ordinary files. The FileNode table is permitted to be sparsely allocated. If FileNodes are deleted, filesystem drivers are permitted to create "holes" in the FileNode table where they once were.
FileNodes are allocated table indices no lower than 32. Table indices from 0 to 31 are reserved for predefined system FileNodes.
Index | Description |
---|---|
0 | The Master FileNode |
1 | The FileNode demarcating the volume's boot area |
2 | The FileNode comprising all bad sectors in the volume |
3 | The FileNode comprising the free space file |
4 | The root directory FileNode |
5 – 31 | Reserved for future use |
The Master FileNode is only required to contain two metadata records: a file-sector index and a special metadata record for the Master FileNode. Other metadata record types do not apply to the Master FileNode. Filesystem drivers are not required to update them (such as updating timestamps in an unnamed attributes record) if they exist.
The root directory is the only system FileNode that filesystem drivers are required to make accessible to applications softwares. Its name is whatever name is defined by the operating system for referring to the volume's root directory.
The root directory is the only directory where entry for the zero-length string in its name index (i.e. the "."/".." entry) points to the directory's own FileNode. Exactly as for all other such entries, this reference is included in the FileNode's link count. Therefore the link count of the root directory FileNode is always at least 1.
On HPFS2, all space in a volume is allocated to a FileNode. Even the boot area belongs to a FileNode. Free space, which doesn't belong to anything else, is the property of the free space file.
When a volume is formatted, the format utility creates a free space file that owns every sector in the volume, apart from sectors allocated to the system FileNodes created at format fime. To extend a volume after formatting, it is simply necessary to extend the free space file to own the new sectors.
The free space file is required to be sparsely allocated. Moreover, it is required that its file-relative sector numbers map 1:1 to volume-relative sector numbers. Filesystem drivers and utilities must preserve this 1:1 mapping.
There are no doubled-up data structures on HPFS2. There's no index to a bitmap. The free space file's file-sector index is the free space list. If the free-space file has a file-relative sector N (i.e. that sector isn't in a "hole" caused by sparse allocation), then volume-relative sector N of the volume is free space. Checking the allocation state of a particular sector is a simple matter of looking up that sector number in the free-space file's file-sector index. Looking for the next available free sector after a given sector M involves looking for the non-sparse extent at or following file-relative sector M in the free-space file's file sector index. Allocating space involves making the free-space file sparser. Freeing space is the reverse.
The free space file FileNode is only required to contain one metadata record: a file-sector index. Other metadata record types do not apply to the free space file FileNode. Filesystem drivers are not required to update them (such as updating timestamps in an unnamed attributes record) if they exist.
The bad sector map for a volume is organized via a system FileNode, the bad sector map FileNode. The bad sector map is treated as a regular file. It comprises a concatenation of all of the bad sectors in the volume. In other words: The file-sector index for the bad sector map maps file-relative sector numbers in the bad sector map file to the bad sectors on the disc. Like the free-space file owns all free sectors, the bad sector map effectively owns all bad sectors.
When a volume is formatted, the format utility creates a bad sector map comprising all of the bad sectors that it is informed about, or that it finds itself. When a filesystem driver or a filesystem repair utility finds an additional bad sector, it simply appends it to the bad sector map FileNode as if it were appending that sector to an ordinary file. The bad sector map file is permitted to be sparsely allocated, but (on the presumption that bad sectors don't turn into good ones) there is no reason for it to be.The bad sector map FileNode is only required to contain one metadata record: a file-sector index. Other metadata record types do not apply to the bad sector map FileNode. Filesystem drivers are not required to update them (such as updating timestamps in an unnamed attributes record) if they exist.
Filesystem maintenance utilities must not attempt to "defragment" the bad sector map file.
The boot area for a volume is demarcated and accessed via a system FileNode, the boot area map FileNode. The boot area map is treated as a regular file. It comprises a concatenation of all of sectors in the volume's boot area. In other words: The file-sector index for the boot area map maps file-relative sector numbers to the sectors in the boot area.
The purpose of the boot area map file is twofold. It ensures that the sectors comprising the volume's boot area are considered to be "in use" by a FileNode, thus eliminating the need for filesystem drivers and repair utilities to treat them as special cases. It also provides a simple means for applications softwares to read and write the volume's boot area, treating it as an ordinary file, without the need to grant users the access rights to read and write the raw sectors of the volume as a whole.
When a volume is formatted, the format utility creates a boot area map comprising the whole of the volume's boot area, including the volume boot record. The boot area map file has a fixed size thereafter, and is not expected to be expanded or shrunk in normal operation.
The boot area map file is permitted to be sparsely allocated. If the boot area is discontiguous, the boot area map file must be sparsely allocated. (The boot area map file is not a view of the entire volume. Sectors outside of the volume's boot area must appear as "holes" in the boot area map file.) The format utility should create the file-sector index for the boot area map FileNode such that there is a 1:1 mapping between file-relative sector numbers in the boot area map file and volume-relative sector numbers in the volume.
Filesystem drivers are not required to make the boot area map file accessible as a regular file to applications softwares, although doing so makes the writing of operating system installation and upgrade tools easier. If an filesystem driver does make the file accessible as a regular file to applications softwares, then normal writes to the file that fill in the "holes" will very probably not preserve the 1:1 mapping between file-relative sector numbers and volume-relative sector numbers. (Filesystem drivers may, but are not required to, employ a special sector allocation policy specifically for this FileNode.) If filesystem repair utilities discover this invariant to be broken, they should restore it.
The boot area map FileNode is only required to contain one metadata record: a file-sector index. It may also, however, contain metadata records for ownership, access control, unnamed attributes, and named attributes. Filesystem drivers are required to respect and to update these just as they would for an ordinary file. (Thus the boot area can have an owner, access controls that prevent unauthorized access to or modification of the volume boot code, and even extended attributes such as a description EA or a modification history EA.) Other metadata record types do not apply to the boot area map FileNode.
Filesystem maintenance utilities must not attempt to "defragment" the boot area map file.
This specification imposes no sector allocation policy upon filesystem drivers. Filesystem drivers are even free to employ the (extremely poor) allocation policy of simply allocating the lowest numbered available sector from the free-space file, should they so choose. However, the following sector allocation policy is recommended:
When adding a data sector to a file, either at the end of the file or in a "hole" in the middle of the file, a filesystem driver should first attempt to grow the extent comprising the immediately preceding (file-relative) sector in place. In other words: A filesystem driver should locate the immediately preceding extent in the file-sector index, and determine whether the volume-relative sector that follows its last sector is free, and allocate that sector if it is.
Thus the data of a file are stored in contiguous runs of disc sectors wherever possible.
When allocating a new name index node or file-sector index node for a FileNode, because its index needs more space, or allocating a new sector for storage of named attribute data, a filesystem driver should first attempt to allocate the next volume-relative sector, following the FileNode sector itself, that is marked free.
Thus the index nodes and attribute data sectors connected with a FileNode are physically close to the FileNode itself.
When allocating a sector to add to the FileNode table file, for storing a new FileNode, a filesystem driver should:
For the case of a file: attempt to allocate a sector close to (but not immediately next to) the FileNode for the parent directory containing the new file.
For the case of a subdirectory: attempt to allocate a sector in a part of the volume with a large amount of contiguous free space.
Thus all of the FileNodes for files in a directory are close to the FileNode for the directory itself (and the data sectors for its name index), and new subdirectories are placed such that they have enough free space near to them for the FileNodes of the files thay they in turn contain.
When allocating sectors for the system FileNodes, and for their (initial) index nodes, a volume format utility should attempt to allocate sectors close to the middle of the volume. (This does not apply to the data sectors for the bad sector map, the boot area map, or the free-space files.)
The system FileNodes and their indexes are frequently referenced, and so average head seek times to access them should be as short as possible.
LEAN mistakes: This is not the place to go into the design mistakes of LEAN at length. But for starters: Version 1 of the specification had linked-lists of clusters, i.e. FATs in all but name, and version 2 of the specification has cylinder groups.
Reserved fields: Reserved fields in structures are set to 0 when creating a record, preserved with their current values when modifying the record, and ignored when reading the record.
Clusters: A third use for clusters is sometimes suggested: portability across devices with different physical sector sizes. Microsoft gives this as the justification for NTFS being designed to use clusters, for example. This portability is illusory, for two simple reasons:
In the case of such filesystem formats, the cluster size is not fixed, but is variable. If the cluster size were fixed to always be the same value, then the portability gain would be a concrete one. But with variable-size clusters, the same difficulties occur with different cluster sizes as they do with different sector sizes. Clustering does not actually eliminate the portability problem. It merely disguises it.
Indeed, it adds to it. Clustering adds the additional problem that the physical sector sizes on different physical discs must evenly divide the selected cluster size, because if they do not, directly transferring a volume between such systems as a simple image copy is impossible. (Consider, for example, the case where the source disc has a 512 byte sector size, the target disc has a 1024 byte sector size, and the volume on the source disc has a 1536 byte, i.e. 3 sector, cluster size. It is not possible to express the cluster size as an integral multiple of the the physical sector size of the target disc.)
In the case of such filesystem formats, the cluster size is expressed as a single integral multiple of the sector size. This requires that all sectors be the same size, and restricts volumes that span multiple disc slices across multiple physical discs to only using physical discs that all have the same physical sector size. If a filesystem format were truly portable and independent of physical sector size, it would not require that all of the physical discs on which a volume is stored have the same physical sector size.
One could solve this problem by modifying the filesystem format to include an individual sectors-per-cluster number for each disc slice in the volume. But this would require that the volume data structures incorporate details of the underlying disc slicing, which would make it difficult to mirror and to relocate volumes because it would require that mirrored and relocated volumes be sliced in exactly the same places and on exactly the same types of physical disc as the originals. (Some mainframe filesystem formats do exactly this, and incorporate details of the disc slicing into the volume data structures. They have very complex storage management requirements as a consequence.)
The best way to solve this is in fact to hide the differences in physical disc sector sizes entirely from the volume-level data structures (and thus from filesystem drivers). Where volumes span multiple disc slices on multiple physical discs, operating systems should present these using a logical sector size, the same across the entire volume, that is a common multiple of all of the physical sector sizes involved. Thus the portability problem, caused by multiple physical discs with differing physical sector sizes, and its solution are both encapsulated in the layer below that of the filesystem format.