views:

244

answers:

2

I would like to find a regular expression (regex) that does detect if you have some invalid escapes in a C double quoted escaped string (where you can find double quotes only escaped).

I consider valid \\ \n \r \" (the test string is using ")

A partial solution to this is to use (?<!\\)\\[^\"\\nr] but this one fails to detect bad escapes like \\\.

Here is a test string that I use to test the matching:

...\n...\\b...\"...\\\\...\\\E...\...\\\...\\\\\..."...\E...

The expression should match the last 6 blocks as invalid, the first 4 are valid. The problem is that my current version does find only 2/5 errors.

A: 

Try this regular expression:

^(?:[^\\]+|\\[\\rn"])*(\\(?:[^\\rn"]|$))

If you have a match, you have an invalid escape sequence.

Gumbo
This doesn't match `"`, which is one of the thinks Sorin stated is invalid.
me_and
+4  A: 
(?:^|[^\\])(?:\\\\)*((?:\"|\\(?:[^\"\\nr]|$)))

That's the start of a string, or something that's not a backslash. Then some (possibly zero) properly escaped backslashes, then either an unescaped " or another backslash; if it's another backslash, it must be followed by something that is neither ", \, n, nor r, or the end of the string.

The incorrect escape is captured for you as well.

me_and
+1 Tried to fix mine for no avail and deleted it.
Amarghosh
Thanks, the Python string would be `r"(?:^|[^\\])(?:\\\\)*((?:\"|\\(?:[^\"\\nr]|$)))"` (one more \ before the first double quote)
Sorin Sbarnea
My editor doesn't require me to escape double-quotes; obviously I missed that one when I was escaping them all. D'oh. Corrected.
me_and