The text of [basic.start.main] in the C++ standard was lifted wholesale from the base document, remaining largely unaltered. Unfortunately, the text of the base document, which was not a formal language standard, contained several flaws. These have been transferred to the C++ standard uncorrected.
I've been trying to have these flaws fixed since 1995. (Getting them corrected was one of the reasons that I joined the BSI C++ committee, in fact.) You can read discussions of these flaws in your favourite Usenet archive, as I first published the details of them in the comp.std.c++ Usenet newsgroup on 1995-06-06.
Here is my proposed rewording of [basic.start.main] and the rationale behind it.
A program shall contain a global function named main
, which is
the designated start of the program.
This function is not predefined by the implementation. It shall be defined using one definition taken from an implementation-defined set of definitions, which shall include both
int main () { /* ... */ }
and
int main (int, char *[]) { /* ... */ }
at minimum. main
shall not be overloaded.
[ Note: It is recommended that additional function types allowed
by implementations have the same types and semantics (see below) of their
first two parameters as those of the second definition above. ] [
Example: Implementations may allow additional definitions of
main
that do not return an integral type. ]
The function main
shall not be called from within a program.
The linkage (basic.link) of main
is implementation defined.
The address of main
shall not be taken. The function
main
shall not be declared except at its definition, and shall
not be declared inline
or static
. The name
main
is not otherwise reserved. [ Example: member
functions, classes, and enumerations can be called main
, as can
entities in other namespaces. ]
Where main
is defined as
int main (int argc, char *argv[]) { /* ... */ }
(i.e. the second of the definitions given above), the parameters
shall be as follows:
argc
shall be the number of arguments passed to the program
from the environment in which the program is run. The value of
argc
shall be non-negative.
Where argc
is greater than 0,
argv[0]
through argv[argc-1]
shall be pointers to
the initial characters of NTMBSs representing the arguments.
The value of argv[argc]
shall be 0.
[ Note: The common convention on many platforms is that the
argument represented by the NTMBS pointed to by argv[0]
shall
be either the name used to invoke the program or "", and that
argc
shall be always greater than 0, resulting in the
behaviour of dereferencing argv[0]
always being defined.
However, this international standard imposes no such requirement. ]
Calling the function
void exit(int) ;
declared in <cstdlib>
(lib.support.start.term)
terminates the program without leaving the current block and hence without
destroying any objects with automatic storage duration
(class.dtor). The argument value is returned to the program's
environment as the value of the program.
A return
statement in function main
has the effect
of leaving the main
function (destroying any objects with
automatic storage duration) and then calling exit
.
Where main
is defined in either of the forms given above, the
argument to exit
shall be the value returned from
main
by that statement.
Where main
is defined with an implementation-defined form not
listed above, the argument to exit
shall be determined, in an
implementation-defined manner, from the value returned from
main
by that statement.
Where main
is defined in either of the forms given above and
control reaches the end of main without encountering a return statement, the
effect is that of executing
return 0 ;
Where main
is defined with an implementation-defined form not
listed above, and control reaches the end of main
without
encountering a return
statement, the effect is implementation
defined. [ Note: Implementations are expected to define behaviour
that is equivalent to that of the return 0
case outlined above,
where reasonable. ]
main
outside of its definition.
The C++ standard does not say that a program may not have a declaration of
main
outside of its definition. It should. Doing so allows
C++ compilers to use a useful technique. (At least one C++ compiler uses
this technique already, and fails to successfully translate programs that
declare main
outside of its definition.)
C++ compilers may wish to employ internal compiler magic of some sort for
the main
function. However, it is possible that an explicit
declaration of main
disables this magic. This is true for at
least one existing C++ compiler, where the program
is not translated successfully, failing at the link stage because the first declaration ofint main ( int, char ** ) ; int main ( int, char ** ) { return 0 ; }
main
turns off the internal compiler
magic for main
that causes its symbol name to be processed
specially when defining it, and causes it to instead be treated as if it
were a normal function.
A prohibition on declaring main
outside of its definition has
the additional welcome benefit of preventing main
from being
declared outside of the translation unit in which it is defined, thus
denying the programmer one way that he or she could take the address of
main
, or call it, both of which it is already intended that a
programmer not do.
The proposed revised wording adds an explicit prohibition on declaring
main
outside of its definition.
main
has.
The C++ standard says that the type of the main
is
implementation defined. It says "and its type is ...", implying that
main
has only one type, and then immediately
proceeds to give two possible types for main
that all implementations must recognise. (According to [dcl.fct] the type
of a function includes the parameter list as well as the return type.)
This is inconsistent.
It is better to say that an implementation accepts a set of types for
main
, only one of which may be used in any one program,
which must include at minimum the types int main()
and
int main(int, char **)
. So the proposed revised wording does.
return 0 ;
should not apply in all cases.
Seemingly motivated by the somewhat foolish notion that the C programs in
chapter 1 of The C Programming Language should be valid C++
programs, the C++ standard contains a provision such that there is an
implicit return 0 ;
inserted into main
in the
absence of an explicit return
statement. (The irony here is
that none of those programs are in fact conforming C programs, since they
all rely on implicit int
, which the C standard does not
provide.)
Unfortunately, this provision is too general if one is going to allow
implementations the latitude to allow additional function types for
main
(in addition to the two types implementations are
required to allow). Where an implementation allows main
to
be declared as returning a struct
type, for example the
implicit return 0 ;
means that
struct T main() { }
becomes
struct T main() { return 0 ; }
which is not necessarily well-formed. If the implicit
return 0 ;
did not extend to such cases, however,
implementations could be free to define some other appropriate implicit
return value.
The proposed revised wording splits the discussion of falling off the end
of main
without a return
statement into two
distinct cases: one where the definition is in one of the two forms that
the standard requires be allowed, providing the semantics that were there
before; and one where the definition is in an additional
implementation-defined form, stating that the semantics are
implementation-defined.
For equivalent reasons, the same split is made in the discussion of the
value passed to the implicit call to void exit(int)
.
argv[0]
is.
The C++ standard states in one place that argv[0]
is a pointer to
an NTMBS that is an
argument passed to the program from the environment in which the program is runbut in another place states that it is a pointer to a NTMBS that is
the name used to invoke the program or "". This is either a contradiction or an unwarranted intrusion into operating-system specific standards by the C++ standard, mandating that the first argument passed to every program must be the name used to invoke the program or "". On Unix, it would be a contradiction. The first argument in the vector is whatever string the process chose to pass to the
execve()
system call. There is no way - nor should
there be any - of enforcing that what a Unix program passes in the first
argument be the name used to invoke the program or "", or even that a
non-zero number of arguments be passed.
I know of no operating system that employs the somewhat perverse notion that the name used to invoke the program is itself an argument. In all of the operating systems that I know, the name used to invoke the program is distinct from the arguments passed to the program.
The proposed revised wording eliminates this perverse notion.
It also, as a side effect of relegating the discussion of program name to
a note, explicitly mentions the fact that a common idiom (of assuming that
argv[0]
is not a null pointer) is not, and never has been,
guaranteed free of undefined behaviour by the C++ standard.
main
.
Throughout [basic.start.main] in the C++ standard, main
is
referred to as main()
. The rest of the document refers to
functions by their signatures. The implication of this is that wherever
[basic.start.main] mentions main()
it is talking about
int main()
rather than int main(int, char **)
.
The proposed revised wording makes a stylistic exception in the case of
main
, since it can have more than one type.