tags:

views:

74

answers:

3

I've been trying to dry up the following regexp that matches hashtags in a string with no success:

/^#(\w+)|\s#(\w+)/i

This won't work:

/^|\s#(\w+)/i

And no, I don't want to comma the alternation at the beginning:

/(^|\s)#(\w+)/i

I'm doing this in Ruby - though that should not be relevant I suppose.

To give some examples of matching and non-matching strings:

'#hashtag it is'        # should match => [["hashtag"]]
'this is a #hashtag'    # should match => [["hashtag"]]
'this is not a#hashtag' # should not match => []

Any suggestions? Am I nitpicking?

+3  A: 

You can use.

/\B#(\w+)/i

"this is a #hash tag"      # matches
"#hash tag"                # matches
"this is not#hash tag"     # doesn't match
Simone Carletti
So that's a non-word boundary. Ingenious. Dry. Thanks!
hakanensari
This also matches `foo.#bar`. Not sure whether the OP wants that or not.
FM
It's bad writing style, but it's plausible to assume author would have meant #bar as a separate word. All I wanted to avoid was text such as "Object#method," where I would assume author is talking Ruby and not hashtagging and all.
hakanensari
A: 

This uses look-behind and I don't know if look behinds are supported in Ruby (I heard that they are not supported in JavaScript)

/(^#(\w+))|((?<= )#(\w+))/
Amarghosh
+1  A: 
/(?:^|\s)#(\w+)/i

Adding the ?: prefix to the first group will cause it to not be a matching group, thus only the second group will actually be a matchgroup. Thus, each match of the string will have a single capturing group, the contents of which will be the hashtag.

Amber