views:

327

answers:

3

I found this regarding how the C preprocessor should handle string literal concatenation (phase 6). However, I can not find anything regarding how this is handled in C++ (does C++ use the C preprocessor?).

The reason I ask is that I have the following:

const char * Foo::encoding = "\0" "1234567890\0abcdefg";

where encoding is a static member of class Foo. Without the availability of concatenation I wouldnt be able to write that sequence of characters like that.

const char * Foo::encoding = "\01234567890\0abcdefg";

Is something entirely different due to the way \012 is interpreted.

I dont have access to multiple platforms and I'm curious how confident I should be that the above is always handled correctly - i.e. I will always get { 0, '1', '2', '3', ... }

+1  A: 

Yes, C++ uses the C preprocessor.

Christopher Barber
More precisely, the C++ standard and C standard agree on certain translation phases, and in preprocessor directives, and every C++ implementation I know of uses a C preprocessor. I like to keep the difference between what the Standards say and what implementations do.
David Thornley
+9  A: 

The language (C as well as C++) has no "preprocessor". "Preprocessor", as a separate functional unit, is an implementation detail. The way the source file(s) is handled if defined by so called phases of translation. One of the phases in C, as well as in C++ involves concatenating string literals.

In C++ language standard it is described in 2.1. For C++ (C++03) it is phase 6

6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.

AndreyT
Right, I was looking for a document detailing this for C++. I was not able to find one - though I found C details readily.
ezpz
Ahh, I missed your update. That is what I was after. Thanks.
ezpz
AndreyT - you forgot to mention that `"\0"` is converted to the target character set _before_ string literals are merged. This is the key to the question at hand.
D.Shawley
@D.Shawley: I don't immediately understand the importance of that. You mean without that the `\0` part could still merge with `12` part and form an octal char literal `\012`? Hm... I'd say that the important part here is actually phase 4, not 5, when each string literal is converted into an independent *preprocessing token*. This alone already takes care of the potential issue with `\012`, doesn't it?
AndreyT
@ezpz: See also this: http://stackoverflow.com/questions/1476892/
sbi
+5  A: 

Yes, it will be handled as you describe, because it is in stage 5 that,

Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set (C99 §5.1.1.2/1)

The language in C++03 is effectively the same:

Each source character set member, escape sequence, or universal-character-name in character literals and string literals is converted to a member of the execution character set (C++03 §2.1/5)

So, escape sequences (like \0) are converted into members of the execution character set in stage five, before string literals are concatenated in stage six.

James McNellis
Right - I get that much. My question is whether this is transparent across C/C++. And, if so, where I can reference that documentation.
ezpz
@ezpz: Sorry; I missed that you were interested in the compatibility between the two. Yes, the results are the same for both C and C++; I've added the language from the C++ standard, which effectively says the same thing. You can find where to get the relevant standards documents from this question: http://stackoverflow.com/questions/81656/where-do-i-find-the-current-c-or-c-standard-documents
James McNellis
+1 for actually mentioning the different stages of translation since this is why `"\0" "12"` is not the same as `"\012"`.
D.Shawley