views:

233

answers:

2

I'm working on a regular expression (in .Net) that needs to mark subexpressions. Sample inputs are:

  1. EFBCFEyy
  2. EFBQFEyyQ
  3. EFBQFE yy Q
  4. EFBMFEyyMM
  5. EFByyMFEMM

What I need is to pull out all of the sub-expressions delineated by "yy" or "MM". The expression I've got so far works for the first few strings, but not the final pair. There may be spaces, which get grouped in with the non-date-format characters around them.

With "/" to separate the subexpressions, this is what I'm looking for (respectively), with the parts in bold being the ones I need to manipulate after the RegEx has evaluated:

  1. EFBCFE/yy
  2. EFBQFE/yy/Q
  3. EFBQFE /yy/ Q
  4. EFBMFE/yy/MM
  5. EFB/yy/MFE/MM

Here's what I have that works for the first three:

(.*)(yy|MM)(.*)

What am I missing?

+2  A: 

"What I need is to pull out all of the sub-expressions delineated by "yy" or "MM"."

yy|MM

That's all you need (unless I minunderstand the question).

Apply as "global". For me it matches the bold parts:

  • EFBCFEyy
  • EFBQFEyyQ
  • EFBQFE yy Q
  • EFBMFEyyMM
  • EFByyMFEMM
Tomalak
What do you mean by "Apply as 'global'"?
Dov
There is a so-called "global" flag in every regex engine. It applies the pattern repeatedly over the entire string, so that you find more than just the very first match.
Tomalak
Have a look at http://gskinner.com/RegExr/ and play around a while. You'll see what I mean.
Tomalak
I need the opposite though, I need the parts that aren't bold.
Dov
Oh, I see. Doing the opposite of matching something is difficult with regex. I'd rather go for the "yy" and "MM", find out their positions, and dissect the string based on that info.
Tomalak
+2  A: 

In Java, this would do what you want:

MyString.split('yy|MM')

I'd be surprised if .NET doesn't have a similar regex split function...

Here we go, this looks to be the .NET equivalent: http://msdn.microsoft.com/en-us/library/8yttk7sy.aspx

Regex.Split( MyString , 'yy|MM' )
Peter Boughton
That's perfect, I don't know how I didn't notice that. Thanks!
Dov
+1 The only issue is: You don't know what you've split on afterwards. You have the pieces, but you have no idea which "delimiters" were between them.
Tomalak
True, if it is necessary to know which delimiters were used, more complex logic is required.
Peter Boughton