ansaurus

Question

Regular expression to skip character in capture group

Answer 1

+7 A:

In short: You can't. A match is always consecutive, even when it contains things as zero-width assertions there is no way around matching the next character if you want to get to the one after it.

Tomalak 2008-11-10 10:38:49

Answer 2

+6 A:

There really isn't a way to create an expression such that the matched text is different than what is found in the source text. You will need to remove the hyphen in a separate step either by matching the first and second parts individually and concatenating the two groups:

match = Regex.Match( text, "([A-B]{2,3})-([0-9]{2,3})" );
matchedText = string.Format( "{0}{1}", 
    match.Groups.Item(1).Value, 
    match.Groups.Item(2).Value );

Or by removing the hyphen in a step separate from the matching process:

match = Regex.Match( text, "[A-B]{2,3}-[0-9]{2,3}" );
matchedText = match.Value.Replace( "-", "" );

Jeff Hillman 2008-11-10 10:45:50

Answer 3

+1 A:

Your assertion that its not possible to do without sub-grouping + concatentating it is correct.

You could also do as Jeff-Hillman and merely strip out the bad character(s) after the fact.

Important to note here tho, is you "dont use regex for everything".

Regex is designed for less complicated solutions for non-trivial problems, and you shouldn't use "oh, we'll use a regex" for everything, and you shoudn't get into the habbit of thinking you can solve the problem in a one-step regex.

When there is a viable trivial method that works, by all means, use it.

An alternative Idea, if you happen to be needing to return multiple matches in a body of code is look for your languages "callback" based regex, which permits passing any matched/found group to a function call which can do in-line substitution. ( Especially handy in doing regexp replaces ).

Not sure how it would work in .Net, but in php you would do something like ( not exact code )

  function strip_reverse( $a )
  {
     $a = preg_replace("/-/", "", $a );
     return reverse($a);
  }
  $b = preg_replace_callback( "/(AB[-]?cde)/" , 'strip_reverse' , "Hello World AB-cde" ;

Kent Fredric 2008-11-10 10:58:36

It is a common misunderstanding that regex is for "less complicated siutations" only. Regex is immensely powerful and con solve really complex stuff. Regex is just not the right tool for things that are not regular. It's simple: There are things that work with regex, and there are those that don't.

Tomalak 2008-11-10 11:13:59

yes, but theres a prolific /overuse/ of regex in situations where the solution is using a firearm to holepunch paper. it'll work, but there are complications that don't exist in the simpler solution. The key is knowing when *not* to use regex ;)

Kent Fredric 2008-11-10 11:34:53

Knowing when to use which tool is always the key. I would probably avoid using regex in a long loop when there was another way (say, "indexOf" plus a little math).

Tomalak 2008-11-10 12:20:45

For those cases there is the "study regex" optimisation which makes a memory tree to boost regex matching ;)

Kent Fredric 2008-11-10 12:35:31

ansaurus

tags:

views:

answers:

Regular expression to skip character in capture group

related questions