tags:

views:

68

answers:

2

I'm dealing with some legacy code that stores its data in a proprietary string format and I'm trying to create a regex to make parsing this format much easier.

What I'm having trouble with is the format contains groups that can be repeated many times sporadically. For example typically the data will look liks this (A)(B)(B)(B), but sometimes it can have multiple (A)'s like this (A)(B)(B)(B)(A)(B)(B), or even (A)(A)(B)(B)(B). The number of repetitions of (B) can vary too, as few as none or as many as well, lots.

What's happening is my current regex works fine when the data looks like (A)(B)(B)... but it breaks when there is another (A) later on in the string. The first (A) gets caught, but all remaining (A)'s don't.

So basically right now I have a regex that has a group for parsing (A)'s and a group for parsing (B)'s and these groups work fine independently, but I can't figure out how to combine these with the correct repetition syntax between them so that dispersed matches get found, instead of only the first one and the rest being ignored.

Am I just missing something or do I have to break my regex up into two separate ones and parse out (A)'s and (B)'s separately? (I'm using C#/.Net)

+4  A: 

If you have a working pattern that matches (A) and another that matches (B), then the expression to match any number of either is

(?:(A)|(B))*

There's no need to get fancy if that's all you need. This expression matches either (A) or (B) any number of times, but leaves the capturing of the groups to the A and B level.

Welbog
That will also match empty string. The * should be + if this is not desired behaviour.
Peter Boughton
A: 

It would help to see your current regexp.

To match any sequence of A's or B's use the following

           (A*B*)*

That any number of groups of of A's followed by any number of B's

This will match the empty string, to ensure there is at least some data :

           (A|B)(A*B*)*

Or is data always starts with an A (as in all your examples)

            A(A*B*)*
Hans B PUFAL