views:

389

answers:

9

I created a new project with the following code segment:

char* strange = "(Strange??)";
cout << strange << endl;

resulting in the following output:

(Strange]

Thus translating '??)' -> ']'

Debugging it shows that my char* string literal is actually that value and it's not a stream translation. This is obviously not a meta-character sequence I've ever seen. Some sort of Unicode or wide char sequence perhaps? I don't think so however... I've tried disabling all related project settings to no avail.

Anyone have an explanation?

  • search : 'question mark, question mark, close brace' c c++ string literal
+16  A: 

What you're seeing is called a trigraph.

In written language by grown-ups, one question mark is sufficient for any situation. Don't use more than one at a time and you'll never see this again.

GCC ignores trigraphs by default because hardly anyone uses them intentionally. Enable them with the -trigraph option, or tell the compiler to warning you about them with the -Wtrigraphs option.

Visual C++ 2010 also disables them by default and offers /Zc:trigraphs to enable them. I can't find anything about ways to enable or disable them in prior versions.

Rob Kennedy
+1, but it's actually a trigraph http://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C.
Matt Ball
You commented quickly. I fixed it once I noticed there really were three characters there. Thanks.
Rob Kennedy
Is the insulting insinuation that Marius is not a grown up really necessary? I'm with holding down vote on the assumption it was meant in good humor, but it's really rather belittling and completely unnecessary.
Daniel Bingham
It was never meant to insinuate anything about Marius. Whether he's a grown-up or not, if he intends his writing to look like it came from one, then he doesn't need repeated punctuation marks.
Rob Kennedy
MSVC started supporting warning about trigraphs in VS2008 (warning C4837), but you have to jump through some hoops to enable it (`-Wall` or something; I'm not quite sure).
Michael Burr
@Rob Kennedy Judging his writing is not what we're here to do. And we can answer each other's questions in a respectful fashion with out snide remarks on each other's maturity levels. Your answer aside from that comment is excellent and right on the mark. But that comment is not funny or necessary and is insulting to Marius whether intended that way or not. And are you honestly telling me that you don't occasionally add more punctuation than strictly necessary in the heat of frustrating debugging? Even if you don't, you have no platform for judging him doing so.
Daniel Bingham
Thanks for the quick, informative and perceptive answer. I don't really want to print strings like that... I was building an SQL string parameterizing substitution function and happened to use a '?' as my placeholder char. Frankly I see having trigraphs enabled by default in a compiler as a bad design decision (you would too if you used generated code (think lex/yacc). Indeed trigraphs can and should be implemented on the OS/UI level as it is on Mac and Linux.
Marius
Trigraphs were put there to help with truly ancient systems where it could be difficult to input or represent characters outside ISO 646. C had to support these systems somehow, but also needs 3 kinds of brackets. Where modern compilers have disabled trigraphs by default it's for the reason you say, that these days we can rely on systems to support at least ASCII, so no workaround is needed. I don't know why trigraphs weren't removed (or at least deprecated with a recommendation compilers should warn) in C99, but there probably was some reason.
Steve Jessop
FWIW, you could generate "??" quite easily as a "Grown Up" in your code. In DOS / Win32, "?" is a wild character for filesystem names so "log??.txt" would represent "log00.txt", "logAB.txt", etc...
Adisak
You'd get away with in for that exact example, since `??.` isn't a trigraph. But `log????-??-??.txt` would match log files which include dates, except that `??-` is the trigraph for `~`.
Steve Jessop
+5  A: 

It's a Trigraph!

Adam Wright
+4  A: 

??) is a trigraph.

Carl Norum
+3  A: 

It's a trigraph.

Sydius
+4  A: 

Trigraphs are the reason. The talk about C in the article also applies to C++

jmucchiello
+4  A: 

That's trigraph support. You can prevent trigraph interpretation by escaping any of the characters:

char* strange = "(Strange?\?)";
R Samuel Klatchko
+3  A: 

Easy way to avoid the trigraph surprise: split a "??" string literal in two:

char* strange = "(Strange??)";
char* strange2 = "(Strange?" "?)";
/*                         ^^^ no punctuation */

Edit
gcc has an option to warn about trigraphs: -Wtrigraphs (enabled with -Wall also)
end edit

Quotes from the Standard

    5.2.1.1 Trigraph sequences
1   Before any other processing takes place, each occurrence of one of the
    following sequences of three characters (called trigraph sequences13))
    is replaced with the corresponding single character.
           ??=      #               ??)      ]               ??!      |
           ??(      [               ??'      ^               ??>      }
           ??/      \               ??<      {               ??-      ~
    No other trigraph sequences exist. Each ? that does not begin one of
    the trigraphs listed above is not changed.
    5.1.1.2 Translation phases
1   The precedence among the syntax rules of translation is specified by
    the following phases.
         1.   Physical source file multibyte characters are mapped, in an
              implementation-defined manner, to the source character set
              (introducing new-line characters for end-of-line indicators)
              if necessary. Trigraph sequences are replaced by corresponding
              single-character internal representations.
pmg
+2  A: 

As mentioned several times, you're being bitten by a trigraph. See this previous SO question for more information:

You can fix the problem by using the '\?' escape sequence for the '?' character:

char* strange = "(Strange\?\?)";

In fact, this is the reason for that escape sequence, which is somewhat mysterious if you're unaware of those damn trigraphs.

Michael Burr
Thanks for the response... due to the nature of this bug it's impossible to search for an answer unless one knows it's a trigraph. The problem with fixing it that way is that I'm using generated C from a lex/yacc parser generator. I've used and created my own trigraphs on my Mac and I feel it's the OS's place to handle trigraph keyboard sequences and not the compiler.Indeed only in VS 2010 they will change this default behavior.
Marius
Yes - I can imagine that searching for help on this without already knowing what a trigraph is presents a serious chicken and egg problem. If you can't change the lex/yacc output and have to use a compiler that won't ignore trigraphs (VS2010 or GCC), then I think you're stuck with having to run the lex/tacc output through a filter that will change trigraphs into harmless non-trigraphs.
Michael Burr
If yacc really is outputting incorrect C ("incorrect" because it fails to parse the grammar specified if that grammar contains consecutive question-mark characters), that's pretty poor. OK, so it's only incorrect because of a misfeature of C, but if you're going to write code generation tools I think you take it on yourself to deal with both features and misfeatures of your target language. But if it's only gone wrong because a trigraph appears in a yacc action, that's the user's mistake for putting it there.
Steve Jessop
+1  A: 

While trying to cross-compile on GCC it picked my sequence up as a trigraph:

So all I need to do now is figure out how to disable this in projects by default since I can only see it creating problems for me. (I'm using a US keyboard layout anyway)

The default behavior on GCC is to ignore but give a warning, which is much more sane and is indeed what Visual Studio 2010 will adopt as the standard as far as I know.

Marius