ansaurus

Question

Why is this C or C++ macro not expanded by the preprocessor?

Answer 1

+9 A:

When you write 1e-X all together like that, the X isn't a separate symbol for the preprocessor to replace - there needs to be whitespace (or certain other symbols) on either side. Think about it a little and you'll realize why.. :)

Edit: "12-X" is valid because it gets parsed as "12", "-", "X" which are three separate tokens. "1e-X" can't be split like that because "1e-" doesn't form a valid token by itself, as Jonathan mentioned in his answer.

As for the solution to your problem, you can use token-concatenation:

#define E(X) 1e-##X
int main()
{
  double a = E(10); // expands to 1e-10
  return 0;
}

tzaman 2010-06-09 06:35:18

This doesn't really explain why `-` isn't considered one of the "certain other symbols" in this case, especially when `int y=12-X;` would work fine.

jamesdlin 2010-06-09 09:23:18

point raisen by jamesdlin is valid.. Does anyone know why y=12-X works but 1e-X does not?

Atul 2010-06-09 11:22:25

@Atul - see my edit; also added a way to do what you want.

tzaman 2010-06-09 11:35:48

@tzaman. Thanks for explaining. Selected your answer as the accepted answer :).

Atul 2010-06-09 20:28:13

@tzaman: Things get really fun with quantities like 0x1E-FOO, where 0x1E would form a valid token (unlike the bare 1e above) and yet compilers still choke.

supercat 2010-10-26 15:46:44

Answer 2

+13 A:

The preprocessor is not a text processor, it works on the level of tokens. In your code, after the define, every occurence of the token X would be replaced by the token 10. However, there is not token X in the rest of your code.

1e-X is syntactically invalid and cannot be turned into a token, which is basically what the error is telling you (it says that to make it a valid token -- in this case a floating point literal -- you have to provide a valid exponent).

avakar 2010-06-09 06:36:06

A clear solution which I was looking for. Thanks

Atul 2010-06-09 07:44:00

Answer 3

+5 A:

GCC 4.5.0 doesn't change the X either.

The answer is going to lie in how the preprocessor interprets preprocessing tokens - and in the 'maximal munch' rule. The 'maximal munch' rule is what dictates that 'x+++++y' is treated as 'x ++ ++ + y' and hence is erroneous, rather than as 'x ++ + ++ y' which is legitimate.

The issue is why does the preprocessor interpret '1e-X' as a single preprocessing token. Clearly, it will treat '1e-10' as a single token. There is no valid interpretation for '1e-' unless it is followed by a digit once it passes the preprocessor. ~~So, I have to guess that the preprocessor sees '1e-X' as a single token (actually erroneous). But I have not dissected the correct clauses in the standard to see where it is required~~. But the definition of a 'preprocessing number' or 'pp-number' in the standard (see below) is somewhat different from the definition of a valid integer or floating point constant and allows many 'pp-numbers' that are not valid as an integer or floating point constant.

If it helps, the output of the Sun C Compiler for 'cc -E -v soq.c' is:

# 1 "soq.c"
# 2
int main()
{
"soq.c", line 4: invalid input token: 1e-X
  double a =  1e-X ;
  return 0;
}
#ident "acomp: Sun C 5.9 SunOS_sparc Patch 124867-09 2008/11/25"
cc: acomp failed for soq.c

So, at least one C compiler rejects the code in the preprocessor - it might be that the GCC preprocessor is a little slack (I tried to provoke it into complaining with gcc -Wall -pedantic -std=c89 -Wextra -E soq.c but it did not utter a squeak). And using 3 X's in both the macro and the '1e-XXX' notation showed that all three X's were consumed by both GCC and Sun C Compiler.

C Standard Definition of Preprocessing Number

From the C Standard - ISO/IEC 9899:1999 §6.4.8 Preprocessing Numbers:

pp-number:
    digit
    . digit
    pp-number digit
    pp-number identifier-nondigit
    pp-number e sign
    pp-number E sign
    pp-number p sign
    pp-number P sign
    pp-number .

Given this, '1e-X' is a valid 'pp-number', and therefore the X is not a separate token (nor is the 'XXX' in '1e-XXX' a separate token). Therefore, the preprocessor cannot expand the X; it isn't a separate token subject to expansion.

Jonathan Leffler 2010-06-09 06:39:22

It doesn't see `1e-X` as a single token. It in fact stops the compilation at the moment it reaches `X` and realizes that it can't form a valid token.

avakar 2010-06-09 06:41:14

Empirically (since the GCC preprocessor does not translate the X as a macro), it treats the X as part of '1e-X' without reporting an error; this might be an error in the GCC preprocessor. The Sun preprocessor determines that 1e-X is an invalid token and still doesn't translate the X - but does report an error. If the preprocessor did not treat the X as part of '1e-X', then it is erroneously failing to translate X into 10; which is where the question started. Yes, it is malformed code - I think that is undeniable. But it seems to me that the preprocessor must be treating '1e-X' as a token.

Jonathan Leffler 2010-06-09 07:27:07

@avakar @Jonathan: During preprocessing, `1e-X` is a valid _preprocessing token_. It just can't be converted into a _token_. I've posted an answer with an explanation.

James McNellis 2010-06-12 17:18:33

Answer 4

+4 A:

Several people have said that 1e-X is lexed as a single token, which is partially correct. To explain:

There are two classes of tokens during translation: preprocessing tokens and tokens. A source file is initially decomposed into preprocessing tokens; these tokens are then used in all of the preprocessing tasks, including macro replacement. After preprocessing, each preprocessing token is converted into a token; these resulting tokens are used during actual compilation.

There are fewer types of preprocessing tokens than there are types of tokens. For example, keywords (e.g. for, while, if) are not significant during the preprocessing phases, so there is no keyword preprocessing token. Keywords are simply lexed as identifiers. When the conversion from preprocessing tokens to tokens takes place, each identifier preprocessing token is inspected; if it matches a keyword, it is converted into a keyword token; otherwise it is converted into an identifier token.

There is only one type of numeric token during preprocessing: preprocessing number. This type of preprocessing token corresponds to two different types of tokens: integer literal and floating literal.

The preprocessing number preprocessing token is defined very broadly. Effectively it matches any sequence of characters that begins with a digit or a decimal point followed by any number of digits, nondigits (e.g. letters), and e+ and e-. So, all of the following are valid preprocessing number preprocessing tokens:

1.0e-10
.78
42
1e-X
1helloworld

The first two can be converted into floating literals; the third can be converted into an integer literal. The last two are not valid integer literals or floating literals; those preprocessing tokens cannot be converted into tokens. This is why you can preprocess the source without error but cannot compile it: the error occurs in the conversion from preprocessing tokens to tokens.

James McNellis 2010-06-12 17:17:23

Yes, in fact it is because of `1e-` being a *pp-number* (though not a *floating-literal*) that `#define E(X) 1e-##X` in the accepted answer works.

avakar 2010-06-12 17:33:47

ansaurus

tags:

views:

answers:

Why is this C or C++ macro not expanded by the preprocessor?

C Standard Definition of Preprocessing Number

related questions