tags:

views:

1146

answers:

1

I'm processing a bunch of tables using this program, but I need to ignore ones that start with the label "tbd_". So far I have something like [^tbd_] but that simply not match those characters.

+8  A: 

You could use a negative look-ahead assertion:

^(?!tbd_).+

Or a negative look-behind assertion:

(^.{1,3}$|^.{4}(?<!tbd_).*)

Or just plain old character sets and alternations:

^([^t]|t($|[^b]|b($|[^d]|d($|[^_])))).*
Gumbo
Is this restricted to any particular regex engines?
Mark Biek
I only ask because that second one still seems to match tbd_ in my test. The first one is great though.
Mark Biek
Take a look at regular-expressions.info’s flavor comparison: http://www.regular-expressions.info/refflavors.html
Gumbo
Just fixed the second regex.
Gumbo
Excellent. Thanks for the link
Mark Biek
@Gumbo - should that not end .* instead of .+? A string that is tbd_ also starts with that... therefore by definition doesn't need to be followed by any other characters? Otherwise, good example. It *does* require a regex engine that supports lookaround though.
BenAlabaster
@balabaster: I don’t think he’s looking for empty strings. But if so, he can easily change that by replacing the `.+` by `.*`
Gumbo
@Gumbo - Point taken
BenAlabaster
Not looking for empty strings, thanks for the help! For some reason it's not working with my regex checker:http://www.bastian-bergerhoff.com/eclipse/features/web/QuickREx/toc.htmlbut I'll finish the xml script and see how that goes
echoblaze
Never mind, looks like the first one is working, thanks!
echoblaze
A little typo: the second one is a negative look-behind assertion.
PhiLho
Thanks, PhiLho.
Gumbo