tags:

views:

387

answers:

3
if($title =~ s/(\s|^|,|\/|;|\|)$replace(\s|$|,|\/|;|\|)//ig)

$title can be a set of titles ranging from President, MD, COO, CEO,...

$replace can be (shareholder), (Owner) or the like.

I keep getting this error. I have checked for improperly balanced '(', ')', no dice :(

Unmatched ) in regex; marked by <-- HERE in m/(\s|^|,|/|;|\|)Owner) <-- HERE (\s|$|,|/|;|\|)/

If you could tell me what the regex does, that would be awesome. Does it strip those symbols? Thanks guys!

+3  A: 

It appears that your variable $replace contains the string Owner), not (Owner).


$title = "Foo Owner Bar";
$replace = "Owner)";
if($title =~ s/(\s|^|,|\/|;|\|)$replace(\s|$|,|\/|;|\|)//ig) {
    print $title;
}

Output:

Unmatched ) in regex; marked by <-- HERE in m/(\s|^|,|/|;|\|)Owner)<-- HERE (\s
|$|,|/|;|\|)/ at test.pl line 3.

$title = "Foo Owner Bar";
$replace = "(Owner)";
if($title =~ s/(\s|^|,|\/|;|\|)$replace(\s|$|,|\/|;|\|)//ig) {
    print $title;
}

Output:

FooBar
Mark Byers
+7  A: 

If the variable $replace can contain regex meta characters you should wrap it in \Q...\E

\Q$replace\E

To quote Jeffrey Friedl's Mastering Regular Expressions

Literal Text Span The sequence \Q "Quotes" regex metacharacters (i.e., puts a backslash in front of them) until the end of the string, or until a \E sequence.

Paul Creasey
Short and sweet! It did the job. Could you please elaborate \Q and \E?
ThinkCode
added a short explanation.
Paul Creasey
+5  A: 

As mentioned, it'll strip those punctuation symbols, followed by the contents of $replace, then more punctuation symbols, and that it's failing because $replace itself contains a mismatched parenthesis.

However, a few other general regex things: first, instead of ORing everything together (and this is just to simplify logic and typing) I'd keep them together in a character class. matching [\s^,\/;\|] is potentially less error-prone and finger friendly.

Second, don't use grouping parenthesis a set of () unless you really mean it. This places the captured string in capture buffers, and incurs overhead in the regex engine. Per perldoc perlre:

WARNING: Once Perl sees that you need one of $& , $` , or $' anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. Source

You can easily get around this by just changing it by adding ?: to the parenthesis:

(?:[\s^,\/;\|])

Edit: not that you need non-capturing grouping in that instance, but it's already in the original regex.

Marc Bollinger
+1, but I think the `^` and `$` were meant as anchors, so they can't go inside the character classes. That means the groups *are* necessary: `(?:^|[\s,;|\/])`, `(?:[\s,;|\/]|$)`
Alan Moore
Learnt something new. \Q$replace\E fixed the issue but thanks a lot Marc.
ThinkCode