views:

574

answers:

8

Is there any particular reason that multi-line string literals such as the following are not permitted in C++?

string script =
"
      Some
   Formatted
 String Literal
";

I know that multi-line string literals may be created by putting a backslash before each newline. I am writing a programming language (similar to C) and would like to allow the easy creation of multi-line strings (as in the above example).

Is there any technical reason for avoiding this kind of string literal? Otherwise I would have to use a python-like string literal with a triple quote (which I don't want to do):

string script =
"""
      Some
   Formatted
 String Literal
""";

Why must C/C++ string literal declarations be single-line?

+25  A: 

The terse answer is "because the grammar prohibits multiline string literals." I don't know whether there is a good reason for this other than historical reasons.

There are, of course, ways around this. You can use line splicing:

const char* script = "\
      Some\n\
   Formatted\n\
 String Literal\n\
";

If the \ appears as the last character on the line, the newline will be removed during preprocessing.

Or, you can use string literal concatenation:

const char* script = 
"      Some\n"
"   Formatted\n"
" String Literal\n";

Adjacent string literals are concatenated during preprocessing, so these will end up as a single string literal at compile-time.

Using either technique, the string literal ends up as if it were written:

const char* script = "      Some\n   Formatted\n  String Literal\n";
James McNellis
Always use the second form; it works around bugs in long literal handling in some versions of the Microsoft compiler.
Donal Fellows
I think the technical motivation for WHY is more what he wants to know. And he doesn't give a rat's butt about the work arounds he writing his own language. How you work around it in C is doesn't really matter.
NoMoreZealots
@NoMoreZealots: Well, the OP has made that clear now. A lot of people ask questions and are unaware that there are workarounds, so suggesting workarounds is often helpful. That said, it wouldn't be a particularly huge change to permit string literals to span multiple lines in C; off the top of my head, I can't imagine such a change breaking any existing, standards-conforming code.
James McNellis
@Donal Fellows: Or just don't use MS compilers ;).
chpwn
@chpwn: Not everyone has the option, and at least historically the MS compilers were generating better (i.e., faster) code than gcc, which is rather attractive for production builds. Since the two ways of writing long literals are about as readable as each other, there's really no reason to go with the version that causes problems.
Donal Fellows
@Donal Fellows: I'm kidding.
chpwn
+1  A: 

Actually, you can break it up thus:

string script =
"\n"
"      Some\n"
"   Formatted\n"
" String Literal\n";

Adjacent string literals are concatenated by the compiler.

Fred Larson
That has nothing to do with the question.
Rizo
@Rizo: Actually, it does. You're asking why C doesn't have a certain feature. Explaining that C does things a slightly different way is relevant. After all, if there were no way to do multiline strings, the answer would be much different.
David Thornley
@David Thornley: However that is not what I was asking for. I want to know if there's any ground not to use multi-line strings.
Rizo
@Rizo: If a language allows things "this way", why should it allow "that way"? The Perl philosophy "there's more than one way to do it" isn't universally accepted, and many language designers try to restrict the number of features and capabilities.
David Thornley
That is not a point. My language have to support multi-line strings because its design is oriented to text formatting. It is a necessary feature, not the syntactic sugar. This is why any C-like alternatives are irrelevant for the question. I need a clear way to define text blocks -- using triple-quote strings (as python does) or standard single-quote strings. I just wanted to know if there are any limitations associated to the second one. By the way, the only I find is that backslash has to be used inside of the string to write a quote character; this does not happen with triple-quotes style.
Rizo
+1  A: 

Strings can lay on multiple lines, but each line has to be quoted individually :

string script =
    "                \n"
    "       Some     \n"
    "    Formatted   \n"
    " String Literal ";
philippe
+1  A: 

You can also do:

string useMultiple =  "this" 
                      "is "
                      "a string in C."; 

Place one literal after another without any special chars.

JonH
Note that this doesn't have the newlines the OP is looking for.
Billy ONeal
+5  A: 

Others have mentioned some excellent workarounds, I just wanted to address the reason.

The reason is simply that C was created at a time when processing was at a premium and compilers had to be simple and as fast as possible. These days, if C were to be updated (I'm looking at you, C1X), it's quite possible to do exactly what you want. It's unlikely, however. Mostly for historical reasons; such a change could require extensive rewrites of compilers, and so will likely be rejected.

Randolpho
Note that C++0x has already done this (see: Verbatim Literals).
Billy ONeal
@Billy ONeal: true, but keep in mind that C++1x (time to increment that 0) is still not formally *finished*. Also, the original question was about C.
Randolpho
I was asking only for a reason, not for "excellent workarounds".Thank you!P.S. I'll use multi-line strings.
Rizo
@Randolpho: I was not saying your answer was incorrect or incomplete -- it is a good answer. I was just throwing in a bit of trivia :)
Billy ONeal
@Billy ONeal: no problems.
Randolpho
It's historical, but more a question of input devices than speed. How do you do the above when you are on a teletype?
Martin Beckett
@Randolpho The canonical answer to "why do we still call it C++0x if it's already 2010 and not finalized yet?" is "suppose the `x` is in hexadecimal..."
Tyler McHenry
@Martin Beckett: the same way you do it if you're editing a normal file, only you can't move your cursor/caret around very well. I honestly don't understand the question. A newline is a newline.
Randolpho
@Tyler McHenry: Heh... pay no attention to the fact that we're dragging our asses over here... it's in *hexadecimal*!
Randolpho
If you have line orientated editing (ie 'ed' or a teletype) it's much easier if each line of input is a single syntactic element for the parser.
Martin Beckett
@Martin Beckett: First, C has *never* had single-line statements; statements have always been ended by a semicolon. Second, even if C wanted to cater to the line-oriented editing style, that doesn't explain why they didn't *also* cater to the multi-line editing style.
Randolpho
@Tyler McHenry: But the second-last digit of the current year in hexadecimal isn't 0, it's D...
caf
+9  A: 

One has to consider that C was not written to be a "Applications" programming language but a systems programming language. It would not be inaccurate to say it was designed expressly to rewrite Unix. With that in mind, there was no EMACS or VIM and you're user interfaces were serial terminals. Multiline string declarations would seem a bit pointless on a system that did not have a multiline text editor. Furthmore string manipulation would not be a primary concern for someone looking write an OS at that particular point in time. The traditional set of UNIX scripting tools such as AWK and SED (amongst MANY others) are a testiment to the fact they weren't using C to do significant string manipulation.

Additional considerations, it was not uncommon in the early 70s (when C was written) to submit your programs on PUNCH CARDS and comeback the next day to get them. Would it have eaten up extra processing time to compile a program with multiline strings literals? Not really it can actually be less work for the compiler. But you were going to comeback for it the next day anyhow in most cases. But nobody who was filling out a punch card was going to put large amounts of text that wasn't needed in there programs.

In a modern environment, there is probably no reason not to include multiline string literals other than designer's preference. Gramatically speaking it's probably simpler because you don't have to take linefeeds into consideration when parsing the string literal.

NoMoreZealots
And it was the BSD OS AFAIK
mathk
NoMoreZealots
+1  A: 

I am writing a programming language (similar to C) and would like to let write multi-line strings easily (like in above example).

There is no reason why you couldn't create a programming language that allows multi-line strings. For example, Vedit Macro Language (which is C-like scripting language for VEDIT text editor) allows multi-line strings, for example:

Reg_Set(1,"
      Some
   Formatted
 String Literal
")

It is up to you how you define your language syntax.

PauliL
+4  A: 

The C preprocessor works on a line-by-line basis, but with lexical tokens. That means that the preprocessor understands that "foo" is a token. If C were to allow multi-line literals, however, the preprocessor would be in trouble. Consider:

"foo
#ifdef BAR
bar
#endif
baz"

The preprocessor isn't able to mess with the inside of a token - but it's operating line-by-line. So how is it supposed to handle this case? The easy solution is to simply forbid multiline strings entirely.

bdonlan
I really don't think this would be a problem. The source must be tokenized before preprocessing directives can be evaluated. Were string literals permitted to contain newline characters, this would simply be a single string literal token. Newlines are significant during preprocessing, but only in certain contexts.
James McNellis