views:

553

answers:

4

Medicare Eligibility EDI Example Responses is what I'm trying to match.

I have a string that looks like this:

LN:SMITHbbbbbbbbFN:SAMANTHAbbBD:19400515PD:1BN:123456PN:9876543210GP:ABCDEFGHIJKLMNOID:123456789012345bbbbbPC:123PH:8005551212CD:123456PB:123ED:20060101TD:2070101LC:NFI:12345678FE:20070101FT:20080101
I need a set of matches that look like this:
Key | Value
-------------------
LN  | SMITHbbbbbbbb
FN  | SAMANTHAbb
BD  | 19400515
... etc

I've been dealing with this all day, and I can't seem to get an acceptable matching scenario. I'm about to program it procedurally with a for loop and finding indexes of colons if I can't figure something out.

I've tried using negative lookahead and I'm not getting anywhere. This is C#, and I'm using this tester (.Net) while I'm testing, along with The Regex Coach (non .Net).

I've tried using this:

([\w]{2})\:(?![\w]{2}\:)

But that only matches the keys and their colons, like "LN:", "FN:", etc.

If I use:

([\w]{2})\:(.+?)([\w]{2})\:

It consumes the next matching two character key and colon as well, leading to me only matching every other key/value pair.

Is there a way for me to match these using RegEx in .Net correctly, or am I stuck with a more procedural solution? Keep in mind, I can't assume that the keys will always be upper case letters. They could possibly include numbers, but they will always be two characters and then a colon.

Thanks in advance for any help I can get.

+7  A: 

I think what you want is positive lookahead, not negative, so that you find the key-colon combo ahead of the current position, but you don't consume it. This appears to work for your test example:

([\w]{2})\:(.+?)(?=[\w]{2}\:|$)

Yielding:

LN: SMITHbbbbbbbb
FN: SAMANTHAbb
BD: 19400515
PD: 1
BN: 123456
PN: 9876543210
...

Note: I added the colons in my test output, they aren't captured by the regex.

EDIT: Thanks, Douglas, I've edited the regex to capture end-of-string so the last entry is captured, too.

Adam Bellaire
We might need something to match the final entry?
Douglas Leeder
Right. I had added that to mine. I'm trying to get it to work with named groups, and then I'll mark it answered. :)
Chris Benard
Negative lookahead also works.
Ates Goral
OK cool, (?<Key>[\w]{2})\:(?<Value>.+?)(?=([\w]{2}\:|$)) in explicit capture mode gives me exactly what I want. Thanks Adam!
Chris Benard
A: 

This works in JavaScript (I always fire up the Error Console in Firefox to play around with regular expressions) but it should also work fine in .NET:

([^:]{2}):((?:[^:](?!(?:[^:]:)))+)
It uses negative lookahead:
( -> start capturing first token (the label)
    [^:]{2} -> two non-colon characters
) -> end capturing first token
: -> skip the colon
( -> start capturing the second token (the value)
    (?: -> don't capture this group as a token
        [^:](?! -> a non-colon character, not followed by:
                (?: -> don't capture this group
                    [^:]: -> a non-colon, followed by a colon
                ) -> end group
            ) -> end negative lookahead
    )+ -> one or more of this group
) -> end capturing the second token

Test:

"LN:SMITHbbbbbbbbFN:SAMANTHAbbBD:19400515"
    .replace(
        /([^:]{2}):((?:[^:](?!(?:[^:]:)))+)/g,
        "[$1] = [$2]\n")

Yields:

[LN] = [SMITHbbbbbbbb]
[FN] = [SAMANTHAbb]
[BD] = [19400515]
Ates Goral
A: 
(([A-Z]){2}\:([A-Za-z0-9])+)+

Try this. This will match 2 Capital Letters followed by a colon and then trail of alphanumeric characters.

Your problem is the \w which for some reason is not working with the class brackets.

http://www.codehouse.com/webmaster_tools/regex/ Heres a Regular expresion evaluator.

fasih.ahmed
A: 

Looking at the link each field is of a fixed length, so you could do something like this:

int pos = 0;
Dictionary<string, string> parsedResults = new Dictionary<string, string>();

foreach (int length in new int[] { 13, 10, 8, 1, 6, 10, 15, 20, 3, 10, 6, 3, 8, 8, 1, 8, 8, 8, })
{
    string fieldId = message.Substring(pos, 2);
    string fieldValue = message.Substring(pos + 3, length);
    parsedResults.Add(fieldId, fieldValue);
    pos += length + 3;
}
ICR