If you want to see the list of sections in this document, to more easily navigate to a specific section, enable the navigation bar support in your WWW browser.
You've come to this page because you've talked about
the
"mbox" mailbox format.
This is the Frequently Given Answer to such definite articles.
"mbox" is actually a family of several mutually incompatible mailbox formats. Different tools support different formats (often without clearly specifying which format they support), and great care must be taken when using different tools that support different formats on a single mailbox file.
With the advent and now widespread adoption of the superior Maildir format over the past several years, the entire "mbox" family of mailbox formats is gradually becoming irrelevant, and of only historical interest.
All of the "mbox" formats store all of the messages in the mailbox in a single file. Delivery appends new messages to the end of the file.
Each message is preceded by a From_
line and followed by a
blank line. A From_
line is a line that begins with the five
characters 'F
', 'r
', 'o
',
'm
', and '
'.
Notes:
The remainder of a
From_
line has no bearing upon whether it is aFrom_
line.A mailbox that contains zero messages contains no lines.
The
From_
lines and the blank lines bracket messages, fore and aft. They do not comprise a message divider. The first message in the mailbox is not preceded by a blank line. The last message in the mailbox is not followed by aFrom_
line.Messages may contain blank lines. Searching for a blank line is not the way to locate the end of a message.
The handling of a mailbox file whose first line is not a
From_
line is undefined. The behaviour of mail handling tools in the face of such a mailbox file varies. Corruption (further corruption) of the mailbox will probably occur.The delivery of messages that contain partial last lines (i.e. whose final characters are not a newline sequence) is undefined. Some tools follow best practice and will deliver the message as it stands. Other tools will first append a newline sequence to the message to complete the last line.
By convention, the 'From
' in a From_
line
is immediately followed by:
If there was no envelope sender, by convention the mailbox name used is
MAILER-DAEMON
. Whitespace characters in the envelope
sender mailbox name are by convention replaced by hyphens.
Historically, the envelope sender was a UUCP "bang path".
asctime()
format (i.e. in English, with the
redundant weekday, and without timezone information)
For best results, and for the same reasons that filesystem formats store file timestamps in UTC, the delivery date should be treated as the UTC time of delivery.
Some specifications attempt to change existing practice by fiat,
redefining the date to be in the form of a date-time
token from RFC 2822.
The "mboxo" mailbox format is the "original" System V mailbox format.
The "mboxo" mailbox format uses irreversible "From quoting" that corrupts
messages. Before a message is appended to a "mboxo" mailbox file, it is
transformed. Any line of the message, in either the header or the body,
that begins with the five characters 'F
', 'r
',
'o
', 'm
', and '
' has a single
'>' character prepended to it. This transformation is irreversible
because it is impossible to distinguish, when reading a message, a line
that began '>From ' in the original message from a line that began
'From ' in the original message and that was subsequently
transformed.
The substitution command for this transformation is
1,$s/^From />&/
.
To locate the start of the next message in an "mboxo" format mailbox,
one scans forward for the next From_
line. There is no
next message if the end of the file is reached.
When reading each message from an "mboxo" format mailbox, one strips off the trailing blank line.
The "mboxrd" mailbox format is named after Rahul Dhesi, who was one of a number of people who invented the same idea roughly simultaneously. (Tim Goodwin said on 1996-08-09 that he first implemented the same idea on 1995-04-04, for example.) The earliest recorded version of Rahul's proposal is this one from 1995-06-24.
The "mboxrd" mailbox format was designed with reversible "From quoting", to solve the message corruption problems inherent in the "mboxo" format. Rahul Dhesi said that mail softwares could be incrementally revised to employ "mboxrd" instead of "mboxo". And indeed it has been adopted to a certain extent. qmail switched from "mboxo" format to "mboxrd" format on 1996-03-02, for example. However, and somewhat amazingly, close to a decade later this adoption has not been as universal as was predicted. Postfix, as of its 2004-08-29 version, still uses the "mboxo" format. (This is ironic when one considers that Postfix is in fact Rahul Dhesi's MTS software of choice.)
Before a message is appended to a "mboxrd" mailbox file, it is
transformed. Any line of the message, in either the header or the body,
that begins with zero or more '>' characters followed by the five
characters 'F
', 'r
', 'o
',
'm
', and '
', has a single '>' character
prepended to it. The substitution command for this transformation is
1,$s/^>*From />&/
.
When a message is read from a "mboxrd" mailbox file, it is transformed
back. Any line of the message, in either the header or the body, that
begins with one or more '>' characters followed by the five characters
'F
', 'r
', 'o
', 'm
',
and '
', has the single leading '>' character removed
from it. The substitution command for this transformation is
1,$s/^>(>*From )/\1/
.
To locate the start of the next message in an "mboxrd" format mailbox,
one scans forward for the next From_
line. There is no
next message if the end of the file is reached.
When reading each message from an "mboxrd" format mailbox, one strips off the trailing blank line.
The "mboxcl" mailbox format is one of the "new" System V mailbox formats. The mutt MUA attempts to convert "mboxo" and "mboxrd" mailboxes to "mboxcl" format.
The "mboxcl" mailbox format uses irreversible "From quoting" that corrupts
messages. Before a message is appended to a "mboxcl" mailbox file, it is
transformed. Any line of the message, in either the header or the body,
that begins with the five characters 'F
', 'r
',
'o
', 'm
', and '
' has a single
'>' character prepended to it. This transformation is irreversible
because it is impossible to distinguish, when reading a message, a line
that began '>From ' in the original message from a line that began
'From ' in the original message and that was subsequently
transformed.
The substitution command for this transformation is
1,$s/^From />&/
.
The "mboxcl" mailbox format does not use From_
line scanning.
Instead, each message contains a Content-Length:
header that
denotes the length of the message body, after transformation, in octets.
This header is added to the message when it is added to the mailbox, and
used to locate the start of each next message.
Notes:
The treatment of messages that already have
Content-Length:
headers is undefined. Best practice is to remove any existingContent-Length:
headers from messages before delivery.A message that has no
Content-Length:
header causes problems, since it is impossible to reliably locate the next message. Some, but not all, "mboxcl" tools fall back to the "mboxo"/"mboxrd" method of scanning forFrom_
lines in such cases.A message that has multiple differing
Content-Length:
headers causes problems, since it is impossible to reliably locate the next message. There is no generally accepted best practice for such cases.
When reading each message from an "mboxcl" format mailbox, one strips off the trailing blank line.
The "mboxcl2" mailbox format is one of the "new" System V mailbox formats.
The "mboxcl2" mailbox format uses no "From quoting" at all. A message is delivered to a "mboxcl2" format mailbox file as it stands, without transformation. Messages are not transformed when they are read, either.
The "mboxcl2" mailbox format does not use From_
line
scanning. Instead, each message contains a Content-Length:
header that denotes the length of the message body in octets. This header
is added to the message when it is added to the mailbox, and used to
locate the start of each next message.
Notes:
The treatment of messages that already have
Content-Length:
headers is undefined. Best practice is to remove any existingContent-Length:
headers from messages before delivery.A message that has no
Content-Length:
header causes problems, since it is impossible to locate the next message. Falling back to the "mboxo"/"mboxrd" method of scanning forFrom_
lines is not possible, since no "From quoting" is employed.A message that has multiple differing
Content-Length:
headers causes problems, since it is impossible to reliably locate the next message. There is no generally accepted best practice for such cases.
When reading each message from an "mboxcl2" format mailbox, one strips off the trailing blank line.
These are some of the consequences of the incompatibilities between the various formats in the "mbox" family.
Messages cannot be reliably read from "mboxo" and "mboxrd" format mailboxes by "mboxcl" and "mboxcl2" readers.
Content-Length:
headers in the
original message, which will be preserved exactly as they stand by "mboxo"
and "mboxrd", are correct and appropriate.
Content-Length:
headers.
Messages cannot be reliably read from "mboxcl2" format mailboxes by "mboxo" or "mboxrd" readers.
Delivering messages to "mboxcl2" format mailboxes with "mboxo" or "mboxrd" tools will corrupt the mailbox, rendering all subsequently delivered messages irretrievable.
Because "From
" at the start of a line is more probable
than ">From
" in real-world messages, an "mboxrd"
reader will restore a greater number of messages written to a mailbox by
an "mboxo" tool to their original forms than an "mboxrd" tool, but will
not and cannot restore all messages.
Conversely, when an "mboxo" reader is used, less message corruption will be observed in the final results if the messages were written by an "mboxo" tool than if they were written by an "mboxrd" tool.