tags:

views:

81

answers:

4

While doing some small regex task I came upon this problem. I have a string that is a list of tags that looks e.g like this:
foo,bar,qux,garp,wobble,thud

What I needed to do was to check if a certain tag, e.g. 'garp' was in this list. (What it finally matches is not really important, just if there is a match or not.)

My first and a bit stupid try at this was to use the following regex:
[^,]garp[,$]

My idea was that before 'garp' there should either be the start of the line/string or a comma, after 'garp' there should be either a comma or the end of the line/string.

Now, it is instantly obvious that this regex is wrong: Both ^ and $ change their behaviour in the context of the character class [ ].

What I finally came up with is the following:
^garp$|^garp,|,garp,|,garp$

This regex just handles the 4 cases one by one. (Tag at beginning of list, in the center, at the end, or as the only element of the list.) The last regex is somehow a bit ugly in my eyes and just for funs sake I'd like to make it a bit more elegant.

Is there a way how the start of line/end of line characters (^ and $) can be used in the context of character classes?

EDIT: Ok, some more info was wished so here it is: I'm using this within an Oracle SQL statement. This sadly does not allow any look-around assertions but as I'm only interested if there is a match or not (and not what is matched) this does not really affect me here. The tags can contain non-alphabetical characters like - or _ so \bgarp\b would not work. Also one tag can contain an other tag as SilentGhost said, so /garp/ doesnt work either.

+7  A: 

You can't use ^ and $ in character classes in the way you wish - they will be interpreted literally, but you can use an alternation to achieve the same effect:

(^|,)garp(,|$)
Mark Byers
Nice, this is what I was looking for... Already looks a lot nicer than my crude or-chain... :)ty.
fgysin
+1  A: 

Just use look-arounds to solve this:

(?<=^|,)garp(?=$|,)

The difference with look-arounds and just regular groups are that with regular groups the comma would be part of the match, and with look-arounds it wouldn't. In this case it doesn't make a difference though.

reko_t
+3  A: 

you just need to use word boundary (\b) instead of ^ and $:

\bgarp\b
SilentGhost
it's called word boundary, and while it'd work in this case it would fail if the value would have non-word characters in it, for example a dash: `foo,garp-er,bar`, it'd match garp even though there's no comma or end-of-string after it
reko_t
+1, beat me to it. Although in his example, he could simply use `garp` and it would work. Not really clear what his real requirements are...
Tim Pietzcker
@reko_t: thanks for the correction, it wouldn't fail though, because you'd be looking for `garp-er`.
SilentGhost
@reko: Right. But if it gets this complicated, then a CSV parser might be an even better bet. Who knows if we might have commas embedded in strings?
Tim Pietzcker
@Tim: I suppose if another tag would be `'garper'` then w/o word boundaries it would find both
SilentGhost
@SilentGhost: yes, but my point was if you were to look for "garp". If you were to look for `garp-er` then similarly it'd fail for `garp-er-er`
reko_t
That's right, reko
SilentGhost
+1  A: 

I'm a big regex fan, but in this case (a comma-separated string), although both Mark Byers', SilentGhost's and reko_t's solution do work, I'd rather suggest looking at a CSV parser.

Might be overkill for the job, but then we don't know the real requirements and the real data that needs to be handled.

Tim Pietzcker