tags:

views:

52

answers:

4

I'm using the following regex

^[a-zA-Z0-9\',!;\?\$\^:\\\/`\|~&\" @#%\*\{}\(\)_\+\.\s=-]{1,1000}$

I know it's ugly, but so far it serves its purpose other than the backslash not being allowed as I think it should because it's escaped, I also tried \\ instead of \\\ but same results. Any ideas?

+1  A: 

If you're putting this in a string within a program, you may actually need to use four backslashes (because the string parser will remove two of them when "de-escaping" it for the string, and then the regex needs two for an escaped regex backslash).

For instance:

regex("\\\\")

is interpreted as...

regex("\\" [escaped backslash] followed by "\\" [escaped backslash])

is interpreted as...

regex(\\)

is interpreted as a regex that matches a single backslash.


Depending on the language, you might be able to use a different form of quoting that doesn't parse escape sequences to avoid having to use as many - for instance, in Python:

re.compile(r'\\')

The r in front of the quotes makes it a raw string which doesn't parse backslash escapes.

Amber
Picked this for explanation, thanks!
Eton B.
A: 

If it's not a literal, you have to use \\\\ so that you get \\ which means an escaped backslash.

That's because there are two representations. In the string representation of your regex, you have "\\\\", Which is what gets sent to the parser. The parser will see \\ which it interprets as a valid escaped-backslash (which matches a single backslash).

Vivin Paliath
A: 

The backslash \ is the escape character for regular expressions. Therefore a double backslash would indeed mean a single, literal backslash.

\ (backslash) followed by any of [\^$.|?*+(){} escapes the special character to suppress its special meaning.

ref : http://www.regular-expressions.info/reference.html

Brad
A: 

http://www.regular-expressions.info/charclass.html :

Note that the only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (), the caret (^) and the hyphen (-). The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash. To search for a star or plus, use [+*]. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.

To include a backslash as a character without any special meaning inside a character class, you have to escape it with another backslash. [\x] matches a backslash or an x. The closing bracket (]), the caret (^) and the hyphen (-) can be included by escaping them with a backslash, or by placing them in a position where they do not take on their special meaning. I recommend the latter method, since it improves readability. To include a caret, place it anywhere except right after the opening bracket. [x^] matches an x or a caret. You can put the closing bracket right after the opening bracket, or the negating caret. []x] matches a closing bracket or an x. [^]x] matches any character that is not a closing bracket or an x. The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. Both [-x] and [x-] match an x or a hyphen.

What language are you writing the regex in?

Nate