The problem with this is that languages containing nested parentheses (or indeed anything nested, IOW anything that requires recursion) are not regular, they are at least context-free. This means that they cannot be described by a regular grammar. Regular expressions are a compact notation for regular grammars. Ergo, nested parentheses cannot be described by regular expressions.
However, we aren't talking about regular expressions here, we are talking about Regexp
s. While their semantics and syntax are (very) loosely based on regular expressions, they are quite different and especially much more powerful. Depending on the particular flavor of Regexp
you use, they may or may not be able to express recursion and thus parse nested parentheses. Perl Regex
, for example can parse nested parentheses. I'm not sure whether Ruby's Regexp
can, but I really don't care, because the way that Regexp
are more powerful than regular expressions is generally achieved by bolting more and more syntax onto them.
This turns regular expressions, which are designed to be simple, in incomprehensible monsters. (If you can tell at a glance what the Perl Regex
posted by @Anon does, then go for it. But I can't and thus I prefer not to use it.)
I prefer using a more powerful parser, rather than a complex Regexp
.
In this case, you have a context-free language, therefore you can use a very simple recursive descent parser. You can further simplify your recursive descent parser by handling those sub-parts which are regular with a regular expression. Finally, if you replace the recursion in the recursive descent parser with iteration + mutation and make clever use of Ruby's boolean semantics, the entire parser gets basically condensed down to this single line:
while str.gsub!(/\([^()]*?\)/, ''); end
Which I don't think is too bad.
Here's the entire thing with some extra removal of duplicate whitespace and (of course) a test suite:
require 'test/unit'
class TestParenthesesRemoval < Test::Unit::TestCase
def test_that_it_removes_even_deeply_nested_parentheses
str = 'This is (was?) some ((heavily) parenthesized (but not overly so
(I hope))) text with (superflous) parentheses: )(.'
res = 'This is some text with parentheses: )(.'
while str.gsub!(/\([^()]*?\)/, ''); end
str.squeeze!(' ')
assert_equal res, str
end
end