tags:

views:

43

answers:

2

in many languages there can be many possibilities to assing a string to a variable:

var = "some 'quoted' string"
var = 'some "quoted" string'
var = `some 'quoted "quoted" string`
var = somestring

Of course in this last variant the space is not possible and end of string is marked by some special character like ; space or > in html.

But my question regards possibility to match all these 4 situations with one regex. The worse situation is with these quotes where first character must be searched at the end of a string and with exception of escaping it.

+2  A: 
var = (?:([`'"])(?:(?!\1).)*\1|[^\s;>]*$)

works for your examples. If you also want to handle escaped quotes, then try

var = (?:([`'"])(?:(?!\1)(?:\\.|.))*\1|[^\s;>]*$)

As a verbose regex:

var\s*=\s*
(?:      # match either:...
 ([`'"]) # one of the quote characters, then...
 (?:     # match the following any number of times:
  (?!\1) # first assert that the next character isn't the quote we matched earlier
  (?:    # if so, then match either
   \\.   # an escaped character
   |     # or
   .     # an unescaped character
  )
 )*      # repeat as often as needed
 \1      # then match the opening quote character again
 |       # ...or...
 [^\s;>]* #match any suite of characters except whitespace, ; or > up to...
 $       # the end of the line/string
)
Tim Pietzcker
Doesn't allow you to escape the quote sign though.
sepp2k
Now it does....
Tim Pietzcker
There's no need to exclude the quotes in that last part; if you get to that point, you know the value isn't quoted. What you *should* exclude is whitespace. Also, the OP said there might be a sentinel like `>` or `;` marking the end of a non-quoted value, but I think that would be a case-by-case thing: `[^\s>]*`, `[^\s;]*`.
Alan Moore
@Alan Moore: Good point. I have now included both delimiters in my revised answer; of course the OP can change them to whatever he/she needs.
Tim Pietzcker
A: 

The easiest would be to use an alternation and describe each format separately:

var = ("[^"]*"|'[^']*'|`[^`]*`|[^;\s>]*)

And if you want to allow that each delimiter may be used when escaped, add that case as follows:

var = ("([^\\"]|\\")*"|'([^\\']|\\')*'|`([^\\`]|\\`)*`|[^;\s>]*)

And if you want to allow other characters (or even any character) to be escaped, replace the corresponding escape sequence with a character class containing the characters \\[…] or use \\. for any character.

Gumbo