Hi, community.
Matching a string that allows escaping is not that difficult. Look here: http://ad.hominem.org/log/2005/05/quoted_strings.php. For the sake of simplicity I chose the approach, where a string is divided into two "atoms": either a character that is "not a quote or backslash" or a backslash followed by any character.
"(([^"\\]|\\.)*)"
The obvious improvement now is, to allow different quotes and use a backreference.
(["'])((\\.|[^\1\\])*?)\1
Also multiple backslashes are interpreted correctly.
Now to the part, where it gets weird: I have to parse some variables like this (note the missing backslash in the first variable value):
test = 'foo'bar'
var = 'lol'
int = 7
So I wrote quite an expression. I found out that the following part of it does not work as expected (only difference to the above expression is the appended "([\r\n]+)"):
(["'])((\\.|[^\1\\])*?)\1([\r\n]+)
Despite the missing backslash, 'foo'bar' is matched. I used RegExr by gskinner for this (online tool) but PHP (PCRE) has the same behaviour.
To fix this, you can hardcode the quote by replacing the backreferences with '. Then it works as expected. Does this mean the backreference does actually not work in this case? And what does this have to do with the linebreak characters, it worked without it?