tags:

views:

81

answers:

3

I have the following regexp

var value = "hello";
"(?<start>.*?\W*?)(?<term>" + Regex.Escape(value) + @")(?<end>\W.*?)"

I'm trying to figure out the meaning, because it doesnt work against the single word. for example, it matches "they said hello us", but fails for just "hello"

can you please help me to decode what does this regexp string mean?!

PS: it's .NET regexp

+3  A: 

Its because of \W in last part. \W is non A-Z0-9_ char.

In "they said hello us", there is space after hello, but "hello" there is nothing there, thats why.

If you change it to (?<end>\W*.*?) it may work.

Actually, the regex itself does not make sense for me, it should rather like

"\b" + Regex.Escape(value) + "\b"

\b is word boundary

S.Mark
awesome! it worked. the issue was with the tailing * as you mentioned. thanks
Michael Nemtsev
+1  A: 

The regex may be trying to find a pattern comprising whole words, so that your hello example doesn't match, say, Othello. If so, the word boundary regex, \b, is tailor-made for the purpose:

@"\b(" + Regex.Escape(value) + @")\b"
Marcelo Cantos
A: 

if this is .NET regex and the Regex.escape() part is replaced with just 'hello' .. Regex Buddy says it means:

(?<start>.*?\W*?)(?<term>hello)(?<end>\W.*?)

Options: case insensitive

Match the regular expression below and capture its match into backreference with name “start” «(?<start>.*?\W*?)»
   Match any single character that is not a line break character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match a single character that is a “non-word character” «\W*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below and capture its match into backreference with name “term” «(?<term>hello)»
   Match the characters “hello” literally «hello»
Match the regular expression below and capture its match into backreference with name “end” «(?<end>\W.*?)»
   Match a single character that is a “non-word character” «\W»
   Match any single character that is not a line break character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Scott Evernden