views:

4021

answers:

7

Technically, any odd number of backslashes, as described in the docs.

>>> r'\'
  File "<stdin>", line 1
    r'\'
       ^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
  File "<stdin>", line 1
    r'\\\'
         ^
SyntaxError: EOL while scanning string literal

It seems like the parser could just treat backslashes in raw strings as regular characters (isn't that what raw strings are all about?), but I'm probably missing something obvious. TIA!

A: 

The reason for why r'\' is syntactical incorrect is that although the string expression is raw the used quotes (single or double) always have to be escape since they would mark the end of the quote otherwise. So if you want to express a single quote inside single quoted string, there is no other way than using \'. Same applies for double quotes.

But you could use:

'\\'
Gumbo
Doesn't answer 'why' :-)
cdleary
+14  A: 

The reason is explained in the part of that section which I highlighted in bold:

String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

So raw strings are not 100% raw, there is still some rudimentary backslash-processing.

oefe
Oh wow... that's weird. Nice catch. Makes sense that r'\'' == "\\'" but it's still strange that the escape character has an effect without disappearing.
cdleary
+3  A: 

That's the way it is! I see it as one of those small defects in python!

I don't think there's a good reason for it, but it's definitely not parsing; it's really easy to parse raw strings with \ as a last character.

The catch is, if you allow \ to be the last character in a raw string then you won't be able to put " inside a raw string. It seems python went with allowing " instead of allowing \ as the last character.

However, this shouldn't cause any trouble.

If you're worried about not being able to easily write windows folder pathes such as c:\mypath\ then worry not, for, you can represent them as r"C:\mypath", and, if you need to append a subdirectory name, don't do it with string concatenation, for it's not the right way to do it anyway! use os.path.join

>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'
hasen j
Good ancillary material. :-) Devil's advocate, though: sometimes you want to differentiate file paths from directory paths by appending the path separator. Nice thing about os.path.join is that it will collapse them: assert os.path.join('/home/cdleary/', 'foo/', 'bar/') == '/home/cdleary/foo/bar/'
cdleary
It doesn't make a (technical) difference though! os.path.isdir will tell you whether a certain path is a directory (folder)
hasen j
Yep, it's just to indicate to someone reading the code whether you expect a path to be a directory or a file.
cdleary
The convention on windows is that files have an extension, always. it's not likely at all (under normal circumstances) to have a text file with a path such as c:\path\data
hasen j
and btw, I did say that I consider this a defect in python! All I'm saying is that, despite my opinion, it practically doesn't really matter
hasen j
..or you can represent them as "c:/mypath" and forget your backslash woes altogether :-)
John Fouhy
+1  A: 

Another user who has since deleted their answer (not sure if they'd like to be credited) suggested that the Python language designers may be able to simplify the parser design by using the same parsing rules and expanding escaped characters to raw form as an afterthought (if the literal was marked as raw).

I thought it was an interesting idea and am including it as community wiki for posterity.

cdleary
parsing is easy either way ..
hasen j
But it might let you avoid having two separate string-literal-parser code paths.
cdleary
A: 

Since \" is allowed inside the raw string. Then it can't be used to identify the end of the string literal.

Why not stop parsing the string literal when you encounter the first "?

If that was the case, then \" wouldn't be allowed inside the string literal. But it is.

Brian R. Bondy
A: 

Comming from C it pretty clear to me that a single \ works as escape character allowing you to put special characters such as newlines, tabs and quotes into strings.

That does indeed disallow \ as last character since it will escape the " and make the parser choke. But as pointed out earlier \ is legal.

Yeah -- the heart of the issue was that raw strings treat \ as a literal instead of the start of an escape sequence. The strange thing is that it still has escape properties for quoting, despite being treated as a literal character.
cdleary
A: 

some tips :

1) if you need to manipulate backslash for path then standard python module os.path is your friend. for example :

os.path.normpath('c:/folder1/')

2) if you want to build strings with backslash in it BUT without backslash at the END of your string then raw string is your friend (use 'r' prefix before your literal string). for example :

r'\one \two \three'

3) if you need to prefix a string in a variable X with a backslash then you can do this :

X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X  # X2 now contains \dummy

4) if you need to create a string with a backslash at the end then combine tip 2 and 3 :

voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name

now lilypond_statement contains "\DisplayLilyMusic \upper"

long live python ! :)

n3on

None of these answer the question of "why", but #3 and #4 should not be used. Slicing and adding strings is generally bad practice, and you should prefer r'\dummy' for #3 (which works fine) and ' '.join([r'\DisplayLilyMusic', r'\upper']) to #4.
cdleary
Reason being that strings are immutable and each slice/concatenation creates a new immutable string object that is typically discarded. Better to accumulate them all and join them together in one step with str.join(components)
cdleary
Oh, whoops -- misunderstood what you meant for #3. I think there a simple '\\' + X is preferred to creating a string just to slice it.
cdleary