tags:

views:

401

answers:

6

I have seen and used C++ code like the following:

int myFourcc = 'ABCD';

It works in recent versions of GCC, not sure how recent. Is this feature in the standard? What is it called?

I have had trouble searching the web for it...

EDIT:

I found this info as well, for future observers:

from gcc documentation

The compiler values a multi-character character constant a character at a time, shifting the previous value left by the number of bits per target character, and then or-ing in the bit-pattern of the new character truncated to the width of a target character. The final bit-pattern is given type int, and is therefore signed, regardless of whether single characters are signed or not (a slight change from versions 3.1 and earlier of GCC). If there are more characters in the constant than would fit in the target int the compiler issues a warning, and the excess leading characters are ignored.

For example, 'ab' for a target with an 8-bit char would be interpreted as (int) ((unsigned char) 'a' * 256 + (unsigned char) 'b')', and '\234a' as (int) ((unsigned char) '\234' * 256 + (unsigned char) 'a')'.

+1  A: 

Not an answer to your question, but the ancient FORTRAN compiler I used in my first job didn't have character variables, so we'd stuff them into INTEGER arrays, 4 characters per integer.

INTEGER STR(4) /'THIS',' IS ', 'A ST', 'RING'/

It frightens me that I still remember stuff like this.

Paul Tomblin
+1 for giving me flash backs
+6  A: 

C++ standard draft says:

A character literal is one or more characters enclosed in single quotes, as in 'x'

and

An ordinary character literal that contains more than one c-char is a multicharacter literal. A multichar- acter literal has type int and implementation-defined value.

abababa22
+6  A: 

"Note that according to the C standard there is no limit on the length of a character constant, but the value of a character constant that contains more than one character is implementation-defined. Recent versions of GCC provide support multi-byte character constants, and instead of an error the warnings multiple-character character constant or warning: character constant too long for its type are generated in this case."

chaos
Thanks! "multi-byte character" is the magic google phrase (:
jw
Can you link the source? Unless you're quoting yourself...
Michael Haren
http://www.network-theory.co.uk/docs/gccintro/gccintro_94.html
chaos
+1  A: 

Yes, it is standard, but implementation-defined.

In practical experience, it represents the 32-bit integer you get by concatenating bytes 'A', 'B', 'C' and 'D'.

Juliano
I guess the standard leaves it undefined due to endianness issues and the like.
jw
It's not undefined, it is implementation defined - a subtle difference
1800 INFORMATION
It is implementation-defined, not undefined. In practice, all compilers I have used accepts at least up to 4 bytes, and the result is the concatenation of those bytes.
Juliano
+5  A: 

See section 6.4.4.4, paragraph 10 of the C99 standard:

An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined. If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.

Recall that implementation-defined means that the implementation (in this case, the C compiler) can do whatever it wants, but it must be documented.

Most compilers will convert it to an integral constant corresponding to the concatenation of the octets corresponding to the individual characters, but the endianness could be either little- or big-endian, depending on the endianness of the target architecture.

Therefore, portable code should not use multi-character constants and should instead use plain integral constants. Instead of 'abcd', which could be of either endianness, use either 0x61626364 or 0x64636261, which have known endiannesses (big and little respectively).

Adam Rosenfield
A: 

If anyone is interested the specific example given is the ID of a data storage format.
It's very useful to be able to get a human readable value of a constant eg 'XVID' rather than just 1234. It's worth thinking about when you are making up arbitrary integer keys.

Martin Beckett
File types on the old Macintosh (circa 1990's) were like that. What was annoying was that the logical way of specifying 0x3F3F3F3F wouldn't work. Would there have been any problem with specifying that trigraphs should only be processed if some directive explicitly so requests?
supercat