This can be done with modern regexes due to the massive number of hacks to regex engines that exist, but let me be the one to post the "Don't Do This With Regular Expressions" answer.
This is not a job for regular expressions. This is a job for a full-blown parser. As an example of something you can't do with (classical) regular expressions, consider this:
()(())(()())
No (classical) regex can determine if those parenthesis are matched properly, but doing so without a regex is trivial:
/* C code */
char string[] = "()(())(()())";
int parens = 0;
for(char *tmp = string; tmp; tmp++)
{
if(*tmp == '(') parens++;
if(*tmp == ')') parens--;
}
if(parens > 0)
{
printf("%s too many open parenthesis.\n", parens);
}
else if(parens < 0)
{
printf("%s too many closing parenthesis.\n", -parens);
}
else
{
printf("Parenthesis match!\n");
}
# Perl code
my $string = "()(())(()())";
my $parens = 0;
for(split(//, $string)) {
$parens++ if $_ eq "(";
$parens-- if $_ eq ")";
}
die "Too many open parenthesis.\n" if $parens > 0;
die "Too many closing parenthesis.\n" if $parens < 0;
print "Parenthesis match!";
See how simple it was to write some non-regex code to do the job for you?
EDIT: Okay, back from seeing Adventureland. :) Try this (written in Perl, commented to help you understand what I'm doing if you don't know Perl):
# split $string into a list, split on the double quote character
my @temp = split(/"/, $string);
# iterate through a list of the number of elements in our list
for(0 .. $#temp) {
# skip odd-numbered elements - only process $list[0], $list[2], etc.
# the reason is that, if we split on "s, every other element is a string
next if $_ & 1;
if($temp[$_] =~ /regex/) {
# do stuff
}
}
Another way to do it:
my $bool = 0;
my $str;
my $match;
# loop through the characters of a string
for(split(//, $string)) {
if($_ eq '"') {
$bool = !$bool;
if($bool) {
# regex time!
$match += $str =~ /regex/;
$str = "";
}
}
if(!$bool) {
# add the current character to our test string
$str .= $_;
}
}
# get trailing string match
$match += $str =~ /regex/;
(I give two because, in another language, one solution may be easier to implement than the other, not just because There's More Than One Way To Do It™.)
Of course, as your problems grow in complexity, there will arise certain benefits of constructing a full-blown parser, but that's a different horse. For now, this will suffice.