Instead of:
(\'[^\']*\'|"[^"]*")
Simply write:
\'([^\']*)\'|"([^"]*)"
\______/ \_____/
1 2
Now one of the groups will match the quoted content.
In most flavor, when a group that failed to match is referred to in a replacement string, the empty string gets substituted in, so you can simply replace with $1$2
and one will be the successful capture (depending on the alternate) and the other will substitute in the empty string.
Here's a PHP implementation (as seen on ideone.com):
$text = <<<EOT
"hello", how 'are "you" today'
EOT;
print preg_replace(
'/\'([^\']*)\'|"([^"]*)"/',
'$1$2',
$text
);
# hello, how are "you" today
A closer look
Let's use 1
and 2
for the quotes (for clarity). Whitespaces will also be added (for clarity).
Before, you have, as your second solution, this pattern:
( 1[^1]*1 | 2[^2]*2 )
\_______________________/
capture whole thing
content and quotes
As you correctly pointed out, this match a pair of quotes correctly (assuming that you can't escape quotes), but it doesn't capture the content part.
This may not be a problem depending on context (e.g. you can simply trim one character from the beginning and end to get the content), but at the same time, it's also not that hard to fix the problem: simply capture the content from the two possibilities separately.
1([^1]*)1 | 2([^2]*)2
\_____/ \_____/
capture contents from
each alternate separately
Now either group 1 or group 2 will capture the content, depending on which alternate was matched. As a "bonus", you can check which quote was used, i.e. if group 1 succeeded, then 1
was used.
Appendix
The […]
is a character class. Something like [aeiou]
matches one of any of the lowercase vowels. [^…]
is a negated character class. [^aeiou]
matches one of anything but the lowercase vowels.
(…)
is used for grouping. (pattern)
is a capturing group and creates a backreference. (?:pattern)
is non-capturing.
References