A proposal for eliminating the underscore madness that library writers have to suffer

This is a proposal for eliminating the necessity for the providers of user-supplied libraries and standard C/C++ libraries to use double underscores on practically every name that is internal to the implementation, even though the C and C++ scoping rules mean that those names are not in scope in client code. It does this by a small modification to the rules for the pre-processing phases of translation, making the pre-processor do less than it did before. In other words: This proposal defines rules for when and where specific preprocessor functionality, the expansion of macros, is turned off.

Rationale and background

Scope of this proposal

This proposal addresses a very specific, but long standing problem: Library implementors have to employ double underscores in their headers in order to avoid their code being potentially broken by macros defined in code that uses the library. For example, here's some code from a real world application library header (taken from the CONONUT API) that ensures that macros named s, S, b, and def don't break the library header:

template <class _S>
inline void control_data::assign(
	const std::string & __s, 
	const matrix<_S>* & __b, 
	const matrix<_S>* __def
) const
{
	retrieve(__s, __b, __def);
}

The problems with macros and libraries is threefold:

The code can get illegible very quickly. C and C++ heavily overuse punctuation anyway, and the addition of underscores to practically every single not-for-external-use identifier in a library header only exacerbates the problem. The example above is, if anything, a mild one.
Every library has to do this. This isn't just a problem for the C and C++ standard libraries that accompany the compiler. It's a problem for every application library, too, be that a first-party or a third-party application library. This is because it is a general problem caused by the interaction between any user-defined macros and any library headers.
This technique is, strictly speaking, not allowed for user-supplied libraries. The C and C++ standard libraries can use such names beginning with double underscores because those names are reserved to the implementation "for any use" (i.e. including for use as a macro). In other words, the C and C++ standards provide this solution, but only for implementation-supplied libraries, not for first-party and third-party user-supplied libraries.

This proposal is intended to enable application library implementors (and indeed standard library implementors) to write code such as the following — which is what they would like to write — in their headers and have it be unaffectable by preprocessor macros defined by the library clients:

template <class S>
inline void control_data::assign(
	const std::string & s, 
	const matrix<S>* & b, 
	const matrix<S>* def
) const
{
	retrieve(s, b, def);
}

Learning from prior failures

Previous failed attempts to solve this problem include Bjarne Stroustrup's #scope proposal, that was proposed in 2004 but that never got off the ground. Its problems were twofold:

It was far too academic a solution. The proposal introduced a general purpose mechanism, that provided a concept of scoping for pre-processor macros and that solved the difficulties being encountered by real world programmers as just one of the things that it could do, instead of a mechanism specifically engineered to the problems actually at hand and no more. People got bogged down in what else the mechanism could be made to do, and all of its various ramifications that were not central to the issue at hand.
It introduced several new pre-processor directives. In addition to #scope and #endscope it included #import and #export, which then became #imports and #exports. People became bogged down in what even to call the new preprocessor directives.

Therefore, this proposal adheres to several principles:

No new preprocessing directives are introduced. There's no potential for the Golgafrinchams to fail to invent the wheel because they cannot decide what colour it should be. A side benefit of this is that library implementors do not need to worry about how to properly employ new directives in their headers to activate a new mechanism.
No general-purpose mechanism is proposed. This mechanism very specifically addresses the problem at hand and the problem at hand only. This proposal, after all, for the benefit of people who don't want the preprocessor mucking things up; and who indeed would rather it were not there at all for their purposes. Expanding the feature set of the preprocessor by providing more knobs to turn is not the goal. Making the preprocessor less intrusive is.
No whiteboard-only ideas are included. I have actually implemented this proposal, constructing and using a preprocessor that employs its mechanism. My implementation experience is related in a later section.

Practices that are unaffected

Some additional goals of this proposal are that various implementation practices should continue to work with this mechanism in place, even though they rely upon interaction between library headers and library clients. (This was a significant problem with the "scopes" proposal. Scoping necessitates ways of becoming visible outwith a scope, and hence yet more mechanisms bolted on to provide those ways.)

Interface-specification macros should not be affected. Many libraries have mechanisms whereby client code sets macros before including the library headers, to specify the API level that the library is expected to provide. Such macros are especially common with platform libraries and include things like:
1. the WIN32_LEAN_AND_MEAN and NOxxx macros in the Microsoft Windows platform API
2. the _POSIX_C_SOURCE and _XOPEN_SOURCE macros in the POSIX platform API
This proposal was designed not to affect the function of such mechanisms.
Application libraries should still be able to use both < > and " " forms of inclusion. Unfortunately, some application library writers think that they are supplying headers, and so use < > form inclusion all over the place, whilst others think that they are supplying source files and so use " " form inclusion all over the place. The form of inclusion chosen must not affect the operation of the mechanism.
Feature-test and parameterization macros should not be affected. The most obvious examples of this are things like FLT_MAX from the standard library itself. This should not be broken. So — for example — headers should not be automatically assigned namespaces, or scopes, from which macro definitions don't escape by default. Such library-defined macros should continue to be usable in library client code without alteration to that code.

The proposal

The proposal in brief

Every macro has an associated priority. Every source file and header has an associated priority. A macro with a lower priority is not expanded when it occurs as an identifier in the non-preprocessing parts of a source file or header of a higher priority. Priority levels are assigned by the implementation, but are expected at minimum to distinguish amongst the user program, first-party and third-party user-supplied libraries, and implementation standard C++ library.

Given that the macro priority mechanism gives a stronger guarantee about macro invasion to a wider range of library writers (i.e. more than just standard C++ library writers), we can even do away with some of the constraints intended to avert these problems but that only apply to the standard C++ library. Although it should be noted that we still need to separately guarantee standard library writers that preprocessor directives that employ standard macros (e.g. #if INT_MAX < 32768…) won't be broken.

Proposed standards text

(Square bracketed ellipses, as per the normal editorial convention, here denote parts of the text that do not change and have been omitted for brevity.)

Append an extra condition to §16.3 ¶9 and ¶10:

A preprocessing directive of the form
# define identifier replacement-list new-line
defines an object-like macro that causes each subsequent instance, that satisfies the priority rules (§16.3.6), of the macro name¹⁵¹ to be replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive¹⁵². […]
A preprocessing directive of the form […] defines a function-like macro with parameters, […] Each subsequent instance, that satisfies the priority rules (§16.3.6), of the function-like macro name followed by a ( as the next preprocessing token introduces the sequence of preprocessing tokens that is replaced by the replacement list in the definition (an invocation of the macro). […]

and to §16.3.4 ¶1:

After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. Then the resulting preprocessing token sequence is rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace, subject to the priority rules (§16.3.6).

Add a new section, §16.3.6:

§16.3.6 Macro expansion priority
Every header (§17.6.1.2) and source file — included (§16.2) or otherwise — in the translation unit is considered to have an associated macro priority. These priorities are determined in an implementation defined manner ^{footnote 1} from the place where the source file or header exists. [Note: The method of inclusion — < > versus " " — does not affect priority. Only the place where the implementation's search actually finds the source file or header does. ^{footnote 2} — end note] The priorities of C++ and C standard library headers must be greater than the priorities of any program source file or non-library header. ^{footnote 3}

^{footnote 1} It is recommend that implementations at minimum employ three priorities, for (in lowest to highest priority order) user program source files, first-party and third-party user-supplied libraries, and implementation-supplied libraries. This standard does not specify that priorities be numeric, merely that they are ordered.

^{footnote 2} The notion of "place" is implementation defined, of course. It isn't a requirement that the same single source file or header found by < > form inclusion via one search path and by " " form inclusion via a different search path be considered to be in the same "place" and thus be assigned the same macro priority. An implementation might define, for example, that inclusion of "dir/a" by header <foo> is a logically different "place" to inclusion of the same source file by some other pathname directly or from some other header with a different priority. Priority is not dependent from inclusion form but may well depend from inclusion route.

^{footnote 3} Implementations are granted the leeway to not have all of the C++ and C standard library headers at the same priority, and to have other source files and headers at the same or higher priority than the standard library headers. Extra implementation-defined headers for freestanding implementations (§17.6.1.3) are but one example of where an implementation might require the same or a higher priority than the standard headers. This constraint is simply to rule out perverse implementations that might otherwise decide to make the standard headers equal to or even lower priority than user program code.

Every macro is also considered to have an associated priority. The priority of a macro is the priority of the header or source file in which it was defined (§16.3). If a macro is redefined in a header or source file of higher priority, its priority is raised accordingly. Otherwise a redefinition has no effect on priority. [Note: Priorities are not, in other words, lowered by redefinitions. — end note] The priorities of predefined macros and other macros not defined by headers or source files are implementation defined. ^{footnote 4}

^{footnote 4} It is recommended that implementations give the standard predefined macros the same priority as macros defined in the standard library headers, and any user-defined macros (not defined by headers or source files) the same priority as macros defined in user source files.

The priorities of macros, headers, and source files govern whether macros are considered for expansion when their names occur as identifiers outside of pre-processing directives. [Note: Priorities do not affect pre-processing directives at all. — end note] A macro of priority N is only considered for expansion (in such contexts) if the header or source file in which its name occurs is lower than or the same as N. The macro is not expanded if the header's or source file's priority is higher.

The replacement text of a macro, after it has been expanded, is considered to be at the same priority as the source file or header in which the macro expansion process began.
[Example: A translation unit comprises one header and two source files:
// third-party user-supplied library source file "lib.h"
#define M ... /* Identifier becomes the ... token. */
int f (int a); // ok: no expansion of any lower-priority macros named "f" or "a"
int g (int a, M); // ok: "M" is expanded as intended

// user program source file
#define f ... /* Identifier becomes the ... token. */
#define a ... /* Identifier becomes the ... token. */
#define basic_string ... /* Identifier becomes the ... token. */

#include <string> // ok: macro "basic_string" has lower priority than the header
#include <lib.h>

#define M ... /* replaces M, but does not lower its priority */

int h() 
{ 
	f();	// error: "f" is expanded because
		// the macro has the same priority as this source file
	M();	// error: "M" is expanded because
		// the macro has a higher priority than this source file
}
— end example]

We can also curtail the prevously sweeping prohibition of §17.6.4.3.1 ¶1:

A translation unit that includes a standard library header shall not #define or #undef macro names defined in any standard library header.

What happens in practice

I have implemented a preprocessor that does this, and experimented with what effects it has. There is a limited amount of prior art in other implementations, too.

Prior art in other implementations

Some implementations already classify headers and included source files by where they are found.

GCC, for example, has a two-level classification of user source files and system headers, dependent from what directory a file is found in. It takes special actions for things defined in "system" headers. This dichotomy isn't enough to make application library headers safe from application-defined macros, since both fall on the "user" side of the dividing line. Nor does it extend to taking special action in "system" headers for things defined in "user" files. But it is a fair start.

My first implementation even employed such a user/system two-level priority system. It wasn't enough in practice. Ironically, it was pointing the implementation at GNU libc and pretending to be GCC (by defining the same predefined macros as it does) that revealed this.

Choosing priorities

Experience led to the multiple-priority-level scheme as specified earlier. I eventually settled upon five priorities. I used this mechanism to preprocess some programs using both the OpenWatcom libraries on a Win32 system and the GNU libc and GCC libraries on a Linux system. The priorities were (in lowest to highest order):

application

Source files for ordinary application code. In other words: everything found in or relative to the current directory, or in or relative to the same directory as an application source file including it.

application libraries

Headers for first-party and third-party applications libraries. The Xerces C++ library is an example of a third-party applications library.

wrapper library

Headers for special wrapper libraries. This was an extra layer required for GNU libc and GCC, more on which in a moment.

platform API library

Headers for the platform API library, such as <unistd.h>, <os2.h>, and <windows.h>.

C++ language standard library

Headers for the C++ standard library, such as <string>, <sstream>, and <stdio.h>.

The "places", that header files are found in, map to priorities as follows:

Places and priorities for OpenWatcom (with target NT)
priority	place
application	everything found relative to the current directory, either directly or indirectly
application libraries	everything found by prefixing a directory given by a `-I` option or listed in the `%INCLUDE%` or `%NT_INCLUDE%` environment variables, except for places explicitly given in other rows of this table
wrapper library	(none)
platform API library	everything found by prefixing `%WATCOM%/h/nt`
C++ language standard library	everything found by prefixing `%WATCOM%/h`

Places and priorities for Microsoft Visual C++
priority	places
application	everything found relative to the current directory, either directly or indirectly
application libraries	everything found by prefixing a directory given by a `-I` option or listed in the `%INCLUDE%` environment variable, except for places explicitly given in other rows of this table
wrapper library	(none)
platform API library	everything found by prefixing `$WindowsSdkDir/include`
C++ language standard library	everything found by prefixing `$VCInstallDir/include`

Places and priorities for GCC
priority	places
priority	GCC on Linux with GNU libc	Cygwin GCC
application	everything found relative to the current directory, either directly or indirectly
application libraries	everything found by prefixing a directory given by a `-I` option (from which GCC automatically eliminates the places given in other rows in this table)
wrapper library	some things found by prefixing `$prefix/$target/$version/include`
platform API library	(none)	everything found by prefixing `/usr/include/w32api`
C++ language standard library	everything found by prefixing `/usr/include/c++/$version` `/usr/include/c++/$version/$target` `/usr/include` some things found by prefixing `$prefix/$target/$version/include`	everything found by prefixing `$prefix/$target/$version/include/c++` `/usr/include` some things found by prefixing `$prefix/$target/$version/include`

Things that implementation experience teaches

A four-level scheme, for application, application library, target platform library, and standard C++ library, would seem to be enough. GCC and GNU libc proved this wrong. GCC has two sets of "special" headers, that it unfortunately mashes together in a single directory:

GCC headers that GNU libc depends from Examples of this include <stdarg.h>, which provides things such as __gnuc_va_list to GNU libc <libio.h> (and thence GNU libc <stdio.h>). These headers need to have language library macro priority, because the macros that they define have to be expandable in language library headers.
GCC headers that wrap GNU libc Examples of this include <limits.h>, which wraps around the GNU libc <limits.h>. These headers need to have wrapper library macro priority, because the macros that they define must not expand in language library headers. (Indeed, the wrapper <limits.h> provided by GCC goes to some effort to ensure that it overrides the GNU libc macros and not the other way around.) Giving them a priority lower than that of the language library also makes it simpler to define what #include_next in those headers does. (It acts just as #include, except that it ignores all places that have a lower or equal priority to the including file.)

The second footnote to §16.3.6 may seem somewhat obscure. This is because it's condensing a fairly complex subject into a handful of sentences. It has three forces driving it:

It is notthe intent to require implementations to figure out whether filesystem aliasing mechanisms such as hard and symbolic links cause different filenames to actually end up at the same physical file in the filesystem.
Implementations that give " " form inclusion the semantics of "also search the directory containing the including file" can simply define that any source file found that way will have the same priority as the file including it, without having to worry about whether the relative pathname actually ends up in a directory that maps to a different priority. Implementations are given the leeway to consider each directory subtree on a search path to be entirely disjoint from any other subtree on the search path. Implementations need only look at the path prefix taken from the search path, calculating priority from that, and not take into account any relative pathname in the included file specification.
As mentioned earlier, some library writers expect to use < > form inclusion all over the place, whilst others use " " form inclusion all over the place. So the inclusion form cannot be used to select between "same priority" and "higher priority", as an early version of my implementation did.

Such leeway for self-relative inclusion is desirable in the case of library incest. To construct a hypothetical example: Posit a target platform priority library that has a header <windows.h> that includes "../stddef.h" in order to incorporate various standard C++ library internals within itself, because it "knows" that the directory for the language's standard library is the parent of the directory for the target platform library. Even though that therefore is the same actual file as <stddef.h>, which has a higher standard C++ priority, every macro from the "../stddef.h" inclusion will have the target platform priority, because the inclusion route by which it was reached doesn't involve the standard C++ library headers.

Put another way: The stddef.h file effectively becomes a new header belonging to the target platform library, distinct from the standard C++ header of the same spelling, because the platform library uses self-relative inclusion to include it. Implementations are under no more obligation than to note that both <windows.h> and "../stddef.h" were both found by prefixing /usr/include/w32api, whereas <stddef.h> is found by prefixing /usr/include. Priority can be determined entirely by what part of the inclusion search path is used, without reference to the additional "../" relative directory prefix within the inclusion specification.

The rule about never lowering priority originates in the realization that macro priority limits the effective grasp of a macro, and that redefinition by a lower-priority header should not narrow the effective grasp of a macro that was already defined at a higher priority with a larger grasp. Higher priority headers would be expecting that macro to remain effective once defined by them.

The drastic curtailment of §17.6.4.3.1 ¶1 comes from the experience that the following program is no longer, with a macro priority system in place, capable of breaking the standard C++ library. Indeed, the pre-processed source compiles cleanly with GCC and works.

#define std ... /* Widely-used identifier becomes the ... token. */
#include <cstdio>
#undef std
int main ()
{
	std::puts("Hello there!");
	return 0;
}

Permission is hereby granted to copy and to distribute this WWW page in its original, unmodified form as long as its last modification datestamp information is preserved. Further permission is hereby granted to copy and to distribute this WWW page for the purposes of C++ standardization work by the ISO Working Group and the various committees and panels of the national standardization bodies. Finally, permission is hereby granted to include any and all of the proposed standards text given herein under the usual ISO rules for committee work products.