The regular expression below:
[a-z]+[\\.\\?]
Why is \\
slash used twice instead of once?
The regular expression below:
[a-z]+[\\.\\?]
Why is \\
slash used twice instead of once?
The first one escapes the period. The second one escapes the question mark.
Both the .
and the ?
are being escaped.
However, with a regular expression character class (within []
), that's not needed. This will work the same way:
[a-z]+[.?]
Edit: with your edit, asking about \\
, it depends. Is this regular expression in a string within ""
? Depending on the language, sometimes \
has to be escaped an extra time within double quotes. But inside ''
it might not be needed. Where are you getting this from?
The regular expression below:
[a-z]+[\\.\\?]
...is not a regular expression but a string (which could be the pattern for a regular expression; you can build a RE for it by passing it to re.compile
, for example).
Why is
\\
slash used twice instead of once?
You may be misunderstanding what's going on...:
>>> s = '[a-z]+[\\.\\?]'
>>> s
'[a-z]+[\\.\\?]'
>>> print(s)
[a-z]+[\.\?]
You enter the \
twice in each case in order to have the first one "escape" the second one, that is, stop it from forming an "escape sequence" with the next following character. You see it twice when you look at the string's repr
(which is what the interactive Python shell is showing you when you just enter at its prompt the name the string object is boound to, for example). But you see it only once when you just look at the string, for example with print
-- the string itself has no duplications, you're probably just being confused by the "entering twice" and "displaying twice" (in repr
) features.
Another handier way to enter exactly the same string value, also as a literal:
>>> z = r'[a-z]+[\.\?]'
>>> z
'[a-z]+[\\.\\?]'
>>> print(z)
[a-z]+[\.\?]
>>> z == s
True
The r
prefix (for "raw literal") means that none of the following backslashes are considered part of escape sequence -- each stands for itself, so no doubling up is needed.
Note that z
behaves exactly like s
and indeed is equal to it: the leading r
does not make "strings of a different type", just offers a handy way to enter strings with lots of backslashes without doubling them up (this is intended to facilitate the entering of literal strings meant as regular-expression patterns; the r
can alternatively be taken as standing for "regular-expression pattern":-).