views:

676

answers:

14

Every programming language I know (Perl, Javascript, PHP, Python, ASP, ActionScript, Commodore Basic) uses single and double quotes to delimit strings.

This creates the ongoing situation of having to go to great lengths to treat quotes correctly, since the quote is extremely common in the contents of strings.

Why do programming languages not use some other character to delimit strings, one that is not used in normal conversation \, | or { } for example) so we can just get on with our lives?

Is this true, or am I overlooking something? Is there an easy way to stop using quotes for strings in a modern programming language?

print <<<END
I know about here document syntax, but for minor string manipulation it's overly complicated and it complicates formatting.
END;

[UPDATE] Many of you made a good point about the importance of using only ASCII characters. I have updated the examples to reflect that (the backslash, the pipe and braces).

+4  A: 

Languages (should) try to be as simple to understand as possible, and using something different from quotes to deal with strings introduces unnecessary complexity.

Artur Soler
I understand that programming languages try to read like written languages, but it's difficult to argue that using quotes to delimit strings reduces complexity -- we constantly have to work around them, and they introduce enormous security concerns.
Andrew Swift
Security concerns remain no matter what escape character you choose, because they go to *deliberate* malice...
dmckee
The question is whether it introduces unnecessary complexity. If there were real advantages to another quotation notation, we could learn and use it, just like we've learned all the other weird punctuation we use.
David Thornley
Using a non-quote for a quotation creates unnecessary complexity. One could argue that even using the single quote is unnecessary: quotes in English (I don't dare to describe other languages that I don't know of) are single for "apostrophes" and double for regular quotations. The only problem is putting a quote in a quoted string - and the overloading of the meaning of quotes to include other things.
David
Using non-quotes creates complexity, but if it solved problems it might be worth it. BTW, you've got a limited idea of what English is. A lot of books (primarily from the English side of the pond) use single quotes as standard quotation marks.
David Thornley
I disagree that ALL security concerns remain no matter which character you choose. If the pipe | was used, 99.9% of the time you could simply strip all pipes from the string and that would remove all danger of an unescaped quote destroying your string. Buy you can't do this with quotes, because they're very often part of the content of the string.
Andrew Swift
In Re: security. Making the sanitizer easier to write is a win of some sort, but you still need it. Worse, the gain only remains while you use ASlang to compile programs that parse input languages that *still* use '"' or "'" for the string delimiter. If the new character catches on, you're screwed all over again. To whatever extent the current situation is uniform, everyone know what character might pose a danger.
dmckee
+2  A: 

Because no one has created a language using some other character that has gotten popular.

I think that is largely because the demand for changing the character is just not there, most programmers are used to the standard quote and see no compelling reason to change the status quo.

Compare the following.

print "This is a simple string."
print "This \"is not\" a simple string."

print ¤This is a simple string.¤
print ¤This "is not" a simple string.¤

I for one don't really feel like the second is any easier or more readable.

chills42
The question is not so much about readability, but about the hassle of constantly dealing with having to escape quotes, to remember if they're escaped, to deal with possible attacks, verifying user input, etc. If you're going to use a character to contain strings, isn't it obvious that it should be a character that's never IN the string? In other comments, several people have said that they appreciate the capability of qq in Perl and """ in Python for these reasons.
Andrew Swift
+5  A: 

Python does have an alternative string delimiter with the triple-double quote """Some String""".

Single quotes and double quotes are used in the majority of languages since that is the standard delimiter in most written languages.

Scott Bevington
Thanks, this is a really useful technique that I had forgotten.
Andrew Swift
I might beg to differ with "standard delimiter in most written languages." The French, for example, use « and » - what do the Chinese use? And so on and so forth. Better to say that it is "the standard delimiter used in English."
David
Ye, this is one of the reasons I love python. I think I use """ for any string longer than two characters.
ilya n.
ilya n.
+3  A: 

Using quotation marks to define a set of characters as separate from the enclosing text is more natural to us, and thus easier to read. Also, " and ' are on the keyboard, while those other characters you mentioned are not, so it's easier to type. It may be possible to use a character that is widely available on keyboards, but I can't think of one that won't have the same kind of problem.

E: I missed the pipe character, which may actually be a viable alternative. Except that it's currently widely used as the OR operator, and the readability issue still stands.

Sean Nyman
The pipe character could be good. It often looks like an l or an I, in sans-serif fonts, but that's not the end of the world. I would prefer something like < and > though - but again, these are also used for less than and greater than.
thomasrutter
I don't think that readability is a valid argument. Any time you read a programming language that you don't know, it starts out difficult to read. Learning a language is partly about learning its syntax, and after a day of seeing $x = \hello\; it would seem completely normal.
Andrew Swift
If the characters being on the keyboard weren't an issue, I would probably go with the quote marks that are used in French (« and »). (And, of course, for French people, not having those characters on the keyboard is usually not an issue. :-) ) But, alas, programmers are lazy and don't want to have to bang out a 4-stroke escape sequence on the numeric keypad (5 if you count holding down the Alt key) (or whatever your OS makes you do to enter alternative characters) every time they want to enter a string delimiter, so we're pretty much stuck with what we've got for the time being.
RobH
+12  A: 

Perl lets you use whatever characters you like

 "foo $bar" eq
 qq(foo $bar) eq
 qq[foo $bar] eq
 qq!foo $bar! eq
 qq#foo $bar# etc

Meanwhile
 'foo $bar' eq
 q(foo $bar) eq
 q[foo $bar] eq
 q!foo $bar! eq
 q#foo $bar# etc

The syntax extends to other features, including regular expressions, which is handy if you are dealing with URIs.

 "http://www.example.com/foo/bar/baz/" =~ /\/foo/[^\/]+\/baz\//;
 "http://www.example.com/foo/bar/baz/" =~ m!/foo/[^/]+/baz/!;
David Dorward
Thanks, I wasn't aware of this notation.
Andrew Swift
Pretty cool
thomasrutter
I often use # for regular expressions because I often want to have slashes in them, and I don't want them ending up like the first line in your second example.
thomasrutter
+1  A: 

You would probably be best off picking a delimiter that exists on all common keyboards and terminal representation sets, so most of the ones you suggest are right out...

And in any case, a quoting mechanism will still be necessary...you gain a reduction in the number of times you use quoting at the cost of making the language harder for non-specialist to read.

So it is not entirely clear that this is a win, and then there is force of habit.

dmckee
+3  A: 

Because those other characters you listed aren't ASCII. I'm not sure that we are ready for, or need a programming language in unicode...

EDIT: As to why not use {}, | or \, well those symbols all already have meanings in most languages. Imagine C or Perl with two different meanings for '{' and '}'!

| means or, and in some languages concatenate strings already. and how would you get \n if \ was the delimiter?

Fundamentally, I really don't see why this is a problem. Is \" really THAT hard? I mean, in C, you often have to use \%, and \ and several other two-character characters so... Meh.

Brian Postow
Very good point. However, as the web develops, almost all major projects are eventually ported to other languages. To support almost any major language outside of English requires UTF-8, or at lease more than ASCII. So, most professional text editors now support UTF-8 and other charsets, and I expect this trend will continue.
Andrew Swift
Sure the editor can handle it. But can the compiler? Do we want to bother?
Brian Postow
It is not necessary to use UTF-8 encoding in your source code in order to build an application that supports Unicode, though, and I always avoid writing actual non-ascii characters in my code, preferring escaped forms eg "\xC2\x88" or whatever is appropriate given the language. I think there is still a good argument against requiring non-ascii characters in source, for interoperability reasons, even for software that supports Unicode.
thomasrutter
An internationalised application, ie one which supports multiple languages, should have all its language strings separate to the source code anyway, so they can be in a format like XML for which the method of identifying the correct character encoding is well defined.
thomasrutter
+3  A: 

Python has an additional string type, using triple double-quotes,

"""like this"""

In addition to this, Perl allows you to use any delimiter you want,

q^ like this ^

I think for the most part, the regular string delimiters are used because they make sense. A string is wrapped in quotes. In addition to this, most developers are used to using their common-sense when it comes to strings that drastically altering the way strings are presented could be a difficult learning curve.

Mike Trpcic
+2  A: 

Ah, so you want old-fashioned FORTRAN, where you'd quote by counting the number of characters in the string and embedding it in a H format, such as: 13HHello, World!. As somebody who did a few things with FORTRAN back in the days when the language name was all caps, quotation marks and escaping them are a Good Thing. (For example, you aren't totally screwed if you are off by one in your manual character count.)

Seriously, there is no ideal solution. It will always be necessary, at some point, to have a string containing whatever quote character you like. For practical purposes, the quote delimiters need to be on the keyboard and easily accessible, since they're heavily used. Perl's q@...@ syntax will fail if a string contains an example of each possible character. FORTRAN's Hollerith constants are even worse.

David Thornley
+1 for Ye olde FORTRANe. No one speaks medieval FORTRAN anymore...
Brian Postow
+2  A: 

You say "having to go to great lengths to treat quotes correctly"; but it's only in the text representation. All modern languages treat strings as binary blocks, so they really don't care about the content. Remember that the text representation is only a simple way for the programmer to tell the system what to do. Once the string is interned, it doesn't have any trouble managing the quotes.

Javier
That would be nice, but what about building MySQL queries in PHP, where all you send is a string? Parameterization is possible, but it's not simple and most people never bother.
Andrew Swift
granted, SQL is broken in this aspect, it's not PHP's fault. just _never_ put variable strings in SQL text. always use bindings.
Javier
+1  A: 

Ada doesn't use single quotes for strings. Those are only for chars, and don't have to be escaped inside strings.

I find it very rare that the double-quote character comes up in a normal text string that I enter into a computer program. When it does, it is almost always because I am passing that string to a command interpreter, and need to embed another string in it.

I would imagine the main reason none of those other characters are used for string delimiters is that they aren't in the original 7-bit ASCII code table. Perhaps that's not a good excuse these days, but in a world where most language designers are afraid to buck the insanely crappy C syntax, you aren't going to get a lot of takers for an unusual string delimiter choice.

T.E.D.
Double quotes frequently occur in product names: Converse "Chuck Taylor" Basketball Shoes, for example.
Andrew Swift
+1  A: 

Python allows you to mix single and double quotes to put quotation marks in strings.

print "Please welcome Mr Jim 'Beaner' Wilson."
>>> Please welcome Mr Jim 'Beaner' Wilson.

print 'Please welcome Mr Jim "Beaner" Wilson.'
>>> Please welcome Mr Jim "Beaner" Wilson

You can also used the previously mentioned triple quotes. These also extend across multiple lines to allow you to also keep from having to print newlines.

print """Please welcome Mr Jim "Beaner" Wilson."""
>>> Please welcome Mr Jim "Beaner" Wilson

Finally, you can print strings the same way as everyone else.

print "Please welcome Mr Jim \"Beaner\" Wilson."
>>> Please welcome Mr Jim "Beaner" Wilson
Steven Hepting
Of all of the ways I've come across to handle strings, I like Python's approach the best.
RobH
+8  A: 

Current: "Typewriter" 'quotation' marks

There are many good reasons for using the quotation marks we are currently using:

  • Quotes are easily found on keyboards - so they are easy to type, and they have to be easy, because strings are needed so often.

  • Quotes are in ASCII - most programming tools only handle well ASCII. You can use ASCII in almost any environment imaginable. And that's important when you are fixing your program over a telnet connection in some far-far-away server.

  • Quotes come in many versions - single quotes, double quotes, back quotes. So a language can assign different meanings for differently quoted strings. These different quotes can also solve the 'quotes "inside" quotes' problem.

  • Quotes are natural - English used quotes for marking up text passages long before programming languages followed. In linguistics quotes are used in quite the same way as in programming languages. Quotes are natural the same way + and - are natural for addition and substraction.

Alternative: “typographically” ‘correct’ quotes

Technically they are superior. One great advantage is that you can easily differenciate between opening and closing quotes. But they are hard to type and they are not in ASCII. (I had to put them into a headline to make them visible in this StackOverflow font at all.)

Hopefully on one day when ASCII is something that only historians care about and keyboards have changed into something totally different (if we are even going to have keyboards at all), there will come a programming language that uses better quotes...

Rene Saarsoo
Latex differenticates between open (AKA the backtick) and close (AKA the apostrophe) quotes using ASCII symbols and a standard US keyboard. Of course, that requires the programmer to type two symbols for double-quotes and the system to be reasonable smart about who it manages quotes. NB: can't escape a backtick in a comment to display it...
dmckee
+2  A: 

One good reason would probably be that if this is the only thing you want to improve on an existing language, you're not really creating a new language.

And if you're creating a new language, picking the right character for the string quotes is probably way way WAY down on the todo list of things to actually implement.

Lasse V. Karlsen