A proposal for eliminating the underscore madness that library writers have to suffer

This is a proposal for eliminating the necessity for the providers of user-supplied libraries and standard C/C++ libraries to use double underscores on practically every name that is internal to the implementation, even though the C and C++ scoping rules mean that those names are not in scope in client code. It does this by a small modification to the rules for the pre-processing phases of translation, making the pre-processor do less than it did before. In other words: This proposal defines rules for when and where specific preprocessor functionality, the expansion of macros, is turned off.

Rationale and background

Scope of this proposal

This proposal addresses a very specific, but long standing problem: Library implementors have to employ double underscores in their headers in order to avoid their code being potentially broken by macros defined in code that uses the library. For example, here's some code from a real world application library header (taken from the CONONUT API) that ensures that macros named s, S, b, and def don't break the library header:

template <class _S>
inline void control_data::assign(
	const std::string & __s, 
	const matrix<_S>* & __b, 
	const matrix<_S>* __def
) const
{
	retrieve(__s, __b, __def);
} 

The problems with macros and libraries is threefold:

This proposal is intended to enable application library implementors (and indeed standard library implementors) to write code such as the following — which is what they would like to write — in their headers and have it be unaffectable by preprocessor macros defined by the library clients:

template <class S>
inline void control_data::assign(
	const std::string & s, 
	const matrix<S>* & b, 
	const matrix<S>* def
) const
{
	retrieve(s, b, def);
} 

Learning from prior failures

Previous failed attempts to solve this problem include Bjarne Stroustrup's #scope proposal, that was proposed in 2004 but that never got off the ground. Its problems were twofold:

Therefore, this proposal adheres to several principles:

Practices that are unaffected

Some additional goals of this proposal are that various implementation practices should continue to work with this mechanism in place, even though they rely upon interaction between library headers and library clients. (This was a significant problem with the "scopes" proposal. Scoping necessitates ways of becoming visible outwith a scope, and hence yet more mechanisms bolted on to provide those ways.)

The proposal

The proposal in brief

Every macro has an associated priority. Every source file and header has an associated priority. A macro with a lower priority is not expanded when it occurs as an identifier in the non-preprocessing parts of a source file or header of a higher priority. Priority levels are assigned by the implementation, but are expected at minimum to distinguish amongst the user program, first-party and third-party user-supplied libraries, and implementation standard C++ library.

Given that the macro priority mechanism gives a stronger guarantee about macro invasion to a wider range of library writers (i.e. more than just standard C++ library writers), we can even do away with some of the constraints intended to avert these problems but that only apply to the standard C++ library. Although it should be noted that we still need to separately guarantee standard library writers that preprocessor directives that employ standard macros (e.g. #if INT_MAX < 32768…) won't be broken.

Proposed standards text

(Square bracketed ellipses, as per the normal editorial convention, here denote parts of the text that do not change and have been omitted for brevity.)

Append an extra condition to §16.3 ¶9 and ¶10:

  1. A preprocessing directive of the form

    # define identifier replacement-list new-line

    defines an object-like macro that causes each subsequent instance, that satisfies the priority rules (§16.3.6), of the macro name151 to be replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive152. […]

  2. A preprocessing directive of the form […] defines a function-like macro with parameters, […] Each subsequent instance, that satisfies the priority rules (§16.3.6), of the function-like macro name followed by a ( as the next preprocessing token introduces the sequence of preprocessing tokens that is replaced by the replacement list in the definition (an invocation of the macro). […]

and to §16.3.4 ¶1:

  1. After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. Then the resulting preprocessing token sequence is rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace, subject to the priority rules (§16.3.6).

Add a new section, §16.3.6:

§16.3.6 Macro expansion priority

  1. Every header (§17.6.1.2) and source file — included (§16.2) or otherwise — in the translation unit is considered to have an associated macro priority. These priorities are determined in an implementation defined manner footnote 1 from the place where the source file or header exists. [Note: The method of inclusion — < > versus " " — does not affect priority. Only the place where the implementation's search actually finds the source file or header does. footnote 2 — end note] The priorities of C++ and C standard library headers must be greater than the priorities of any program source file or non-library header. footnote 3

    footnote 1 It is recommend that implementations at minimum employ three priorities, for (in lowest to highest priority order) user program source files, first-party and third-party user-supplied libraries, and implementation-supplied libraries. This standard does not specify that priorities be numeric, merely that they are ordered.

    footnote 2 The notion of "place" is implementation defined, of course. It isn't a requirement that the same single source file or header found by < > form inclusion via one search path and by " " form inclusion via a different search path be considered to be in the same "place" and thus be assigned the same macro priority. An implementation might define, for example, that inclusion of "dir/a" by header <foo> is a logically different "place" to inclusion of the same source file by some other pathname directly or from some other header with a different priority. Priority is not dependent from inclusion form but may well depend from inclusion route.

    footnote 3 Implementations are granted the leeway to not have all of the C++ and C standard library headers at the same priority, and to have other source files and headers at the same or higher priority than the standard library headers. Extra implementation-defined headers for freestanding implementations (§17.6.1.3) are but one example of where an implementation might require the same or a higher priority than the standard headers. This constraint is simply to rule out perverse implementations that might otherwise decide to make the standard headers equal to or even lower priority than user program code.

  2. Every macro is also considered to have an associated priority. The priority of a macro is the priority of the header or source file in which it was defined (§16.3). If a macro is redefined in a header or source file of higher priority, its priority is raised accordingly. Otherwise a redefinition has no effect on priority. [Note: Priorities are not, in other words, lowered by redefinitions. — end note] The priorities of predefined macros and other macros not defined by headers or source files are implementation defined. footnote 4

  3. footnote 4 It is recommended that implementations give the standard predefined macros the same priority as macros defined in the standard library headers, and any user-defined macros (not defined by headers or source files) the same priority as macros defined in user source files.

  4. The priorities of macros, headers, and source files govern whether macros are considered for expansion when their names occur as identifiers outside of pre-processing directives. [Note: Priorities do not affect pre-processing directives at all. — end note] A macro of priority N is only considered for expansion (in such contexts) if the header or source file in which its name occurs is lower than or the same as N. The macro is not expanded if the header's or source file's priority is higher.

  5. The replacement text of a macro, after it has been expanded, is considered to be at the same priority as the source file or header in which the macro expansion process began.

  6. [Example: A translation unit comprises one header and two source files:

    // third-party user-supplied library source file "lib.h"
    #define M ... /* Identifier becomes the ... token. */
    int f (int a); // ok: no expansion of any lower-priority macros named "f" or "a"
    int g (int a, M); // ok: "M" is expanded as intended
    
    // user program source file
    #define f ... /* Identifier becomes the ... token. */
    #define a ... /* Identifier becomes the ... token. */
    #define basic_string ... /* Identifier becomes the ... token. */
    
    #include <string> // ok: macro "basic_string" has lower priority than the header
    #include <lib.h>
    
    #define M ... /* replaces M, but does not lower its priority */
    
    int h() 
    { 
    	f();	// error: "f" is expanded because
    		// the macro has the same priority as this source file
    	M();	// error: "M" is expanded because
    		// the macro has a higher priority than this source file
    }
    — end example]

We can also curtail the prevously sweeping prohibition of §17.6.4.3.1 ¶1:

  1. A translation unit that includes a standard library header shall not #define or #undef macro names defined in any standard library header.

What happens in practice

I have implemented a preprocessor that does this, and experimented with what effects it has. There is a limited amount of prior art in other implementations, too.

Prior art in other implementations

Some implementations already classify headers and included source files by where they are found.

GCC, for example, has a two-level classification of user source files and system headers, dependent from what directory a file is found in. It takes special actions for things defined in "system" headers. This dichotomy isn't enough to make application library headers safe from application-defined macros, since both fall on the "user" side of the dividing line. Nor does it extend to taking special action in "system" headers for things defined in "user" files. But it is a fair start.

My first implementation even employed such a user/system two-level priority system. It wasn't enough in practice. Ironically, it was pointing the implementation at GNU libc and pretending to be GCC (by defining the same predefined macros as it does) that revealed this.

Choosing priorities

Experience led to the multiple-priority-level scheme as specified earlier. I eventually settled upon five priorities. I used this mechanism to preprocess some programs using both the OpenWatcom libraries on a Win32 system and the GNU libc and GCC libraries on a Linux system. The priorities were (in lowest to highest order):

application
Source files for ordinary application code. In other words: everything found in or relative to the current directory, or in or relative to the same directory as an application source file including it.
application libraries
Headers for first-party and third-party applications libraries. The Xerces C++ library is an example of a third-party applications library.
wrapper library
Headers for special wrapper libraries. This was an extra layer required for GNU libc and GCC, more on which in a moment.
platform API library
Headers for the platform API library, such as <unistd.h>, <os2.h>, and <windows.h>.
C++ language standard library
Headers for the C++ standard library, such as <string>, <sstream>, and <stdio.h>.

The "places", that header files are found in, map to priorities as follows:

Places and priorities for OpenWatcom (with target NT)
priority place
application everything found relative to the current directory, either directly or indirectly
application libraries everything found by prefixing a directory given by a -I option or listed in the %INCLUDE% or %NT_INCLUDE% environment variables, except for places explicitly given in other rows of this table
wrapper library (none)
platform API library everything found by prefixing
  • %WATCOM%/h/nt
C++ language standard library everything found by prefixing
  • %WATCOM%/h
Places and priorities for Microsoft Visual C++
priority places
application everything found relative to the current directory, either directly or indirectly
application libraries everything found by prefixing a directory given by a -I option or listed in the %INCLUDE% environment variable, except for places explicitly given in other rows of this table
wrapper library (none)
platform API library everything found by prefixing
  • $WindowsSdkDir/include
C++ language standard library everything found by prefixing
  • $VCInstallDir/include
Places and priorities for GCC
priority places
GCC on Linux with GNU libc Cygwin GCC
application everything found relative to the current directory, either directly or indirectly
application libraries everything found by prefixing a directory given by a -I option (from which GCC automatically eliminates the places given in other rows in this table)
wrapper library some things found by prefixing
  • $prefix/$target/$version/include
platform API library (none) everything found by prefixing
  • /usr/include/w32api
C++ language standard library everything found by prefixing
  • /usr/include/c++/$version
  • /usr/include/c++/$version/$target
  • /usr/include
some things found by prefixing
  • $prefix/$target/$version/include
everything found by prefixing
  • $prefix/$target/$version/include/c++
  • /usr/include
some things found by prefixing
  • $prefix/$target/$version/include

Things that implementation experience teaches

A four-level scheme, for application, application library, target platform library, and standard C++ library, would seem to be enough. GCC and GNU libc proved this wrong. GCC has two sets of "special" headers, that it unfortunately mashes together in a single directory:

The second footnote to §16.3.6 may seem somewhat obscure. This is because it's condensing a fairly complex subject into a handful of sentences. It has three forces driving it:

Such leeway for self-relative inclusion is desirable in the case of library incest. To construct a hypothetical example: Posit a target platform priority library that has a header <windows.h> that includes "../stddef.h" in order to incorporate various standard C++ library internals within itself, because it "knows" that the directory for the language's standard library is the parent of the directory for the target platform library. Even though that therefore is the same actual file as <stddef.h>, which has a higher standard C++ priority, every macro from the "../stddef.h" inclusion will have the target platform priority, because the inclusion route by which it was reached doesn't involve the standard C++ library headers.

Put another way: The stddef.h file effectively becomes a new header belonging to the target platform library, distinct from the standard C++ header of the same spelling, because the platform library uses self-relative inclusion to include it. Implementations are under no more obligation than to note that both <windows.h> and "../stddef.h" were both found by prefixing /usr/include/w32api, whereas <stddef.h> is found by prefixing /usr/include. Priority can be determined entirely by what part of the inclusion search path is used, without reference to the additional "../" relative directory prefix within the inclusion specification.

The rule about never lowering priority originates in the realization that macro priority limits the effective grasp of a macro, and that redefinition by a lower-priority header should not narrow the effective grasp of a macro that was already defined at a higher priority with a larger grasp. Higher priority headers would be expecting that macro to remain effective once defined by them.

The drastic curtailment of §17.6.4.3.1 ¶1 comes from the experience that the following program is no longer, with a macro priority system in place, capable of breaking the standard C++ library. Indeed, the pre-processed source compiles cleanly with GCC and works.

#define std ... /* Widely-used identifier becomes the ... token. */
#include <cstdio>
#undef std
int main ()
{
	std::puts("Hello there!");
	return 0;
}

© Copyright 2012 Jonathan de Boyne Pollard. "Moral" rights asserted.

Permission is hereby granted to copy and to distribute this WWW page in its original, unmodified form as long as its last modification datestamp information is preserved. Further permission is hereby granted to copy and to distribute this WWW page for the purposes of C++ standardization work by the ISO Working Group and the various committees and panels of the national standardization bodies. Finally, permission is hereby granted to include any and all of the proposed standards text given herein under the usual ISO rules for committee work products.