views:

32

answers:

1

Given the follow STL error:

./poly_power/poly_class.cpp:496: error: no matching function for call to ‘state_operator< polynomial< variable_term< polynomial< variable_term< std::basic_string<char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term<std::basic_string< char, std::char_traits< char>, std::allocator<char> >, int> > > >, polynomial<variable_term< polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> > > > >::operation( std::pair< const std::basic_string< char, std::char_traits< char>, std::allocator< char> >, state_vector_term< polynomial< variable_term< polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> > > >, polynomial< variable_term< polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> > > > > >&)’

how can I use a reg-ex expression to simplify it to something that looks like:

./poly_power/poly_class.cpp:496: error: no matching function for call to ‘state_operator<...>::operation( std::pair<...>&)’

ie. Convert everything within the outer pair of <> into .... I am aware of STLFilt, a perl script that does something similar, but I think it would be pedagogic to see how this could be done in pure regex.

Bonus

Paraitimze the expression so that it works on the nth level of <>. The 1st level would be example above, while the 2nd level would show something like state_operator<polynomial<...>, polynomial<...> >.

+4  A: 

What you have here is an irregular grammar that is near-impossible to match with a regular expression.

Edit:

I have experimented around a bit, and, well, it is possible (provided that angle brackets occur nowhere else in the string, that they are balanced (one closing bracket for each opening one), and that you can define an upper limit to the nesting involved):

Match <...> (without nesting):

<[^<>]*+>

Match <...<...>...<...>...> (one level of nesting):

<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+[^<>]*+>

Match with up to two levels of nesting:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

Up to three levels:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

Up to four levels:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

Up to five levels:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

etc.

To get the bonus point, here's the last one broken up into a verbose regex, so you can see a bit easier how it is constructed:

<             # Match the first opening <
[^<>]*+       # Match any non-<>-characters possessively
  (?:         # Match the following zero or more times:
  <             # Match a <
  [^<>]*+       # etc. etc. etc.
    (?:
    <
    [^<>]*+
      (?:
      <
      [^<>]*+
        (?:
        <
        [^<>]*+
          (?:          # innermost level:
          <            # Match a <
          [^<>]*+      # Match any non-<> characters
          >            # Match a >
          [^<>]*+      # Match any non-<> characters
          )*+          # any number of times, possessively
        >            # then back one level: Match a >
        [^<>]*+      # etc. etc. etc.
        )*+
      >
      [^<>]*+
      )*+
    >
    [^<>]*+
    )*+
  >
  [^<>]*+
  )*+
[^<>]*+
>               # Match the final closing >

Isn't this perfectly horrible?

For your example, it seems you need seven levels of nesting. If you want a quick regex constructor, it could look like this (in Python):

def innerregex(nesting):
    if nesting == 0:
        return ""
    else:
        return "(?:<[^<>]*+" + innerregex(nesting-1) + ">[^<>]*+)*+"

def makeregex(nesting):
    return "<[^<>]*+" + innerregex(nesting) + ">"

makeregex(2) will return a regex that can match <<<><>>> correctly. makeregex(7) should work on the entire string, makeregex(6) will only take the innermost matches, and so on.

Tim Pietzcker
I wish I could +1 you again. I just remembered that about the upper limit and was going to code this up… good work. Are you sure you simplified the error message correctly? It would be surprising if the compiler outputted unbalanced brackets.
Potatoswatter
I think the brackets have been interpreted away by SO's markup processor (where they are used to mark quotes). I have reformatted your question. Now it works, more or less.
Tim Pietzcker
This is amazing, and answers the spirit of the question: learning how to match a nesting like this. I can see now that this is _not_ a good fit for a reg-ex. Thank you!
Hooked
Great to hear it. This was a bizarre but fun exercise :) I just hope that nobody now thinks that he can defeat Cthulhu when trying to parse HTML with regular expressions...
Tim Pietzcker