tags:

views:

903

answers:

5

Can anyone suggest a regex to match the underscore in the following examples:

test_test
test[_test
test_]

But NOT match this:

test[_]test

This is using the .Net Regular Expression library. I'm using this RegEx tester to check:

http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

+1  A: 

I don't know about .Net but the regex would be composed of two parts, one matching any character except a bracket followed by an underscore and the other one vice versa:

[^\[](_)|(_)[^\]]

Edit: Just noticed that you need to add the cases where the underscore is in the beginning or the end:

[^\[](_)|(_)[^\]]|^_|_$
soulmerge
Didn't work for : test[_test(I'm using the RegEx tester here to test: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx)
Moskie
hi Moskie.. without any options checked.. this regex matches underscore in the "test[_test" string
neoneye
A: 

Try

^.*(\[_[^\]])|([^\[]_\])|([^\[]_[^\]]).*$

EDIT: Now handles

test_test

Not tested, but read: Any string of characters followed by either [_ then any character but ] or any character but [ then _]

Note, this might fail for cases like

_]Test
Test[_

I don't know if that's a problem for you?

Tested successfully with all your examples

Mark Pim
+3  A: 

Try this:

_[^\]]|[^[]_

It consists of an alternation of _[^\]] (underscore and not ]) and [^[]_ (not [ and underscore).

Or if you want to use look-around assertions to really match just the underscore and not surrounding characters:

_(?=[^\]])|_(?<=[^[]_)

This matches any underscore that is not followed by a ] ((?=[^\]]), positive look-ahead) or any underscore that is not preceded by a [ ((?<=[^[]_), negative look-behind). And this can be combined to:

_(?:(?=[^\]])|(?<=[^[]_))
Gumbo
This is close, except it's matching an extra character, and not just the underscore.I tried modifying it to use lookahead, but it doesn't work in all situations._(?=[^\]])|(?=[^[])_
Moskie
Tested with string: "test[_]test [_test test[_test test_]test test_] test_test". Found 3 matches, should find 5.
Kevin Albrecht
@Moskie: You would need a look-behind assertion in the latter case.
Gumbo
I think that does it: _(?=[^\]])|(?<=[^[])_
Moskie
@Moskie: It would be better to use `_(?=[^\]])|_(?<=_[^[])` instead.
Gumbo
That doesn't work for test_]test
Moskie
I meant `_(?=[^\]])|_(?<=[^[]_)`.
Gumbo
Cool, I think that's the one. I'll give you the credit, but I'd suggest you modify your answer to have that last regex.
Moskie
+1  A: 
_(?!\](?<=\[_\]))

If the underscore isn't followed by a closing bracket, the negative lookahead succeeds immediately. Otherwise, it does a lookbehind to find out if the underscore is also preceded by an opening bracket. You can replace the "_]" with dots to make it clear that you're only interested in the opening bracket this time:

_(?!\](?<=\[..))

You can do the lookbehind first if you want:

_(?<!\[_(?=\]))

The important thing is that the second lookaround has to be nested within the first one in order to achieve the "NOT (x AND y)" semantics.

Testing it in EditPad Pro, it matches the underscore in all but the last of these strings:

test_test
test[_test
test_]
_]Test
Test[_
test[_]test

EDIT: here's an easier-to-read version:

(?<!\[)_|_(?!\])

What I like about the nested-lookaround version is that it doesn't do anything until it actually finds an underscore. Unless the regex engine is smart enough optimize it away, this "(NOT x) OR (NOT y)" version will do a negative lookbehind at every single position.

Alan Moore
Tested with string: "test[_]test [_test test[_test test_]test test_] test_test". Found 0 results, should find 5.
Kevin Albrecht
I've tested it in .NET and Java and got the same result: the first five strings match and the last one doesn't.
Alan Moore
+1  A: 

Hi you might try

((?<!\[)_|_(?!\]))

which uses negative lookahead/behind (rather than positive lookahead/behind and excluded characters).

it depends
I guess this should work
chappar