Your regex (?<!Sub ).*\(.*\)
, taken apart:
(?<! # negative look-behind
Sub # the string "Sub " must not occur before the current position
) # end negative look-behind
.* # anything ~ matches up to the end of the string!
\( # a literal "(" ~ causes the regex to backtrack to the last "("
.* # anything ~ matches up to the end of the string again!
\) # a literal ")" ~ causes the regex to backtrack to the last ")"
So, with your test string:
Sub ChangeAreaTD()
- The look-behind is fulfilled immediately (right at position 0).
- The
.*
travels to the end of the string after that.
- Because of this
.*
, the look-behind never really makes a difference.
You were probably thinking of
(?<!Sub .*)\(.*\)
but it is very unlikely that variable-length look-behind is supported by your regex engine.
So what I would do is this (since variable-length look-ahead is widely supported):
^(?!.*\bSub\b)[^(]+\(([^)]+)\)
which translates as:
^ # At the start of the string,
(?! # do a negative look-ahead:
.* # anything
\b # a word boundary
Sub # the string "Sub"
\b # another word bounday
) # end negative look-ahead. If not found,
[^(]+ # match anything except an opening paren ~ to prevent backtracking
\( # match a literal "("
( # match group 1
[^)]+ # match anything up to a closing paren ~ to prevent backtracking
) # end match group 1
\) # match a literal ")".
and then go for the contents of match group 1.
However, regex generally is hideously ill-suited for parsing code. This is true for HTML the same way it is true for VB code. You will get wrong matches even with the improved regex. For example here, because of the nested parens:
MsgBox ("The total run time to fix all fields (AREA, TD) is: ...")