views:

114

answers:

5

Just a little curiosity at work, here. While working on something dangerous, I got to thinking about the implementations of various compilers and their associated standard libraries. Here's the progression of my thoughts:

  1. Some classes of identifiers are reserved for implementation use in C++ and C.

  2. A compiler must perform the stages of compilation (preprocessing, compilation, linking) as though they were performed in sequence.

  3. The C preprocessor is not aware of the reserved status of identifiers.

  4. Therefore, a program may use reserved identifiers if and only if:

    1. The reserved identifiers used are all preprocessor symbols.

    2. The preprocessing result does not include reserved identifiers.

    3. The identifiers do not conflict with symbols predefined by the compiler (GNUC et. al.)

Is this valid? I'm uncertain on points 3 and 4.3. Moreover, is there a way to test it?

A: 

If you're asking if you can #define if while and make your code unreadable, then yes. This was a common practice in the obfuscated C competition. This would actually go against your 4.2, though.

For things like GNUC, these are predefined, but you can usually redefine them and undef them. It is not really a good idea to do this, but you can. More interesting would be redefining or undefining __LINE__, __FILE__, and preprocessor symbols like that (b/c they change automatically).

nategoose
I'm not asking that, actually, though I got it from a linked question anyway.
Jon Purdy
+1  A: 

The story is more complicated than that, I think, at least for the if and only if. What I recall from C99:

E.g 3. is false, the defined token is reserved even in the preprocessing phase, and pseudo-macros like __LINE__, __func__ etc may not be redefined either.

Then, the reservation of identifiers depends on the scope.

  • Some identifiers are explicitly reserved for external symbols, e.g setjmp.
  • Identifiers with starting with underscore and then another underscore or a capital letter are reserved everywhere in C. You should never touch them, even with the preprocessor.
  • Identifiers starting with underscore and then a lowercase letter are forbidden in file scope since they may refer to external symbols. They can be used freely in function scope.

4.2 is not completely correct either. First it is only undefined behavior (aka very evil) to have a macro defined that has a keyword as its name under the following condition:

A standard header is included while a macro is defined with the same name as a keyword (7.1.2).

Then, a macro that contains its own name in its expansion is "safe", since the expansion is guaranteed not to be recursive. Something like the following would be valid, though not recommended:

#define if(...)                                         \
for(int _i = 0; _i < 1; ++_i)                           \
  for(int _cond = (__VA_ARGS__);                        \
      _i < 1;                                           \
      printf("line %d val %d\n", __LINE__, _cond),      \
        ++_i)                                           \
    if(_cond)

(BTW, don't anyone use that macro, it compiles and does about what it looks like, but has corner cases that let it explode.)

Jens Gustedt
In C++ the restriction is more stringent: you cannot define a macro with a name lexically identical to a keyword in any translation unit that includes a standard library header, even if the macro is not defined when the header is included.
James McNellis
+1  A: 

The C preprocessor is not aware of the reserved status of identifiers.

I'm not sure what you mean by "aware", but I don't think you can necessarily assume this - 7.1.3 says

All identifiers that begin with an underscore an either an uppercase or another underscore are always reserved for any use

The preprocessor (or compiler) implementation can use these reserved identifiers for whatever purposes suit it - it doesn't need to warn you if you're misusing these identifiers.

I'd suggest that "a program may use reserved identifiers if and only if" the standard (for example the set of pre-defined macros) or the implementation says so in its documentation.

Of course, I think you'll get away with using identifiers that are reserved in quite a few cases - implementations don't go out of their way to cause you problems. An awful lot of code uses names that are reserved, and I'd guess that implementations would rather not break that code without good enough reason. However, it would be best if you avoided that namespace altogether if you're not implementing a compiler toolchain.

Michael Burr
+4  A: 

(The comments on the question explain that we're talking about reserved identifiers in the sense of C99 section 7.1.3, i.e., identifiers matching /^_[A-Z_]/ anywhere, /^_/ in file scope, /^str[a-z]/ with external linkage, etc. So here's my guess at at least a part of what you're asking...)

They're not reserved in the sense that (any particular phase of) the compiler is expected to diagnose their misuse. Rather, they're reserved in that if you're foolish enough to (mis)use them yourself, you don't get to complain if your program stops working or stops compiling at a later date.

We've all seen what happens when people with only a dangerous amount of knowledge look inside system headers and then write their own header guards:

#ifndef _MYHEADER_H
#define _MYHEADER_H
// ...
#endif

They're invoking undefined behaviour, but nothing diagnoses this as "error: reserved identifier used by end-user code". Instead mostly they're lucky and all is well; but occasionally they collide with an identifier of interest to the implementation, and confusing things happen.

Similarly, I often have an externally-visible function named strip() or so:

char *strip(char *s) {
  // remove leading whitespace
  }

By my reading of C99's 7.1.3, 7.26, and 7.26.11, this invokes undefined behaviour. However I have decided not to care about this. The identifier is not reserved in that anything bad is expected to happen today, but because the Standard reserves to itself the right to invent a new standard str-ip() routine in a future revision. And I've decided that I reckon string-ip, whatever that might be, is an unlikely name for a string operation to be added in the future -- so in the unlikely event that happens, I'll cross that bridge when I get to it. Technically I'm invoking undefined behaviour, but I don't expect to get bitten.

Finally, a counter-example to your point 4:

#include <string.h>
#define memcpy(d,s,n)  (my_crazy_function((n), (s)))
void foo(char *a, char *b) {
  memcpy(a, b, 5);  // intends to invoke my_crazy_function
  memmove(a, b, 5); // standard behaviour expected
}

This complies with your 4.1, 4.2, 4.3 (if I understand your intention on that last one). However, if memmove is additionally implemented as a macro (via 7.1.4/1) that is written in terms of memcpy, then you're going to be in trouble.

John Marshall
Alright, thanks for your answer.
Jon Purdy
The other practical upshot of `strip` being a reserved identifier is that an implementation *today* is free to implement such a function, and be conforming. So eg. `strdup()` can be defined even in strictly conforming mode.
caf
+1  A: 

Identifiers like _UNDERSCORE_CAP and double__underscore are reserved for use by the implementation as it sees fit. It's not a problem if the implementation uses them, such as having, say, a _File identifier or macro in <stdio.h>, that's what the reservation is for. It is a potential problem if the user uses one.

Therefore, in order to diagnose this, the compiler would have to keep track of where identifiers came from. It wouldn't be sufficient to just check code not in <angle_bracket_files.h>, since those can define macros that might be used and are likely to expand to something using implementation-reserved words. For example, isupper might be defined in <ctype.h> as

#define isupper(x) (_UPPER_BIT & _CHAR_TRAITS[x])

or some such. (It's been a long time since I saw the definition I based the above on.)

Therefore, to keep track of this, the preprocessor would have to maintain records on which macro came from there, among other things. Tracking that would complicate the preprocessor considerably, to what compiler writers appear to think no corresponding gain.

David Thornley