tags:

views:

173

answers:

6

I'm trying to write a regular expression that finds C#-style unescaped strings, such as

string x = @"hello
world";

The problem I'm having is how to write a rule that handles double quotes within the string correctly, like in this example

string x = @"before quote ""junk"" after quote";

This should be an easy one, right?

A: 

How 'bout the regex @\"([^\"]|\"\")*\"(?=[^\"])

Due to greedy matching, the final lookahead clause is likely not to be needed in your regex engine, although it is more specific.

Eamon Nerbonne
+1  A: 

Try this one:

@".*?(""|[^"])"([^"]|$)

The first parantheses mean 'If there is an " before the finishing quote, it better be two of them', the second parantheses mean 'After the finishing quote, there sould ether be not a quote, or the end of the line'.

Jens
A: 

If I remember correctly, you have to use \"" - the double-double quotes to hash it for C# and the backslash to hash it for regex.

pdr
A: 

Try this:

@"[^"]*?(""[^"]*?)*";

It looks for the starting characters @", for the ending characters "; (you can leave the semicolon out if you need to) and in between it can have any characters except quotes, or if there are quotes they have to be doubled.

rslite
A: 

"^@(""|[^"])*$" is the regex you want, looking for first an at-sign and a double-quote, then a sequence of any characters (except double-quotes) or double double-quotes, and finally a double-quote.

As a string literal in C#, you'd have to write it string regex = "^@\"(\"\"|[^\"])*\"$"; or string regex = @"^@""(""""|[^""])*""$";. Choose your poison.

Joren
A: 
@"(?:""|[^"])*"(?!")

is the right regex for this job. It matches the @, a quote, then either two quotes in a row or any non-quote character, repeating this up unto the next quote (that isn't doubled).

Tim Pietzcker