tags:

views:

74

answers:

1

I have two strings that I need to pull data out of but can't seem to get it working. I wish I knew regular expression but unfortunately I don't. I have read some beginner tutorials but I can't seem to find an expression that will do what I need.

Out of this first string delimited by the equal character, I need to skip the first 6 characters and grab the following 9 characters. After the equal character, I need to grab the first 4 characters which is a day and year. Lastly for this string, I need the remaining numbers which is a date in YYYYmmdd.

636014034657089=130719889904

The second string seems a little more difficult because the spaces between the characters differ but always seem to be delimited by at minimum, a single space. Sometimes, there are as many as 15 or 20 spaces separating the blocks of data.

Here are two different samples that show the space difference.

!!92519 C 01 M600200BLNBRN D55420090205M1O

!!95815      A               M511195BRNBRN            D62520070906  ":%/]Q2#0*&

The data that I need out of these last two strings are:

The zip code following the 2 exclamation marks.
The single letter 'M' following that. It always appears to be in a 13 character block
The 3 numbers after the single letter
The next 3 numbers which are the person's height
The following next 3 are the person's weight
The next 3 are eye color
The next block of 3 which are the person's hair color

The last block that I need data from:

I need to get the single letter which in the example appears to be a 'D'. Skip the next 3 numbers The last and remaining 8 numbers which is a date in YYYYmmdd

If someone could help me resolve this, I'd be very grateful.

+2  A: 

For the first string you can use this regular expression:

^[0-9]{6}([0-9]{9})=([0-9]{4})([0-9]{4})([0-9]{2})([0-9]{2})$

Explanation:

^          Start of string/line
[0-9]{6}   Match the first 6 digits
([0-9]{9}) Capture the next 9 digits
=          Match an equals sign
([0-9]{4}) Capture the "day and year" (what format is this in?)
([0-9]{4}) Capture the year
([0-9]{2}) Capture the month
([0-9]{2}) Capture the date
$          End of string/line

For the second:

^!!([0-9]{5}) +.*? +M([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) +([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})

Rubular

It works in a similar way to the first. You may need to adjust it slightly if your data is not exactly in the format that the regular expression expects. You might want to replace the .*? with something more precise but I'm not sure what because you haven't described the format of the parts you are not interested in.

Mark Byers
Thanks Mark. I'm going to try that now.
Jim
@Jim - use a capture group.
TrueWill
Mark, The second regex doesn't match anything at all. I get a compilation error. 'Compilation failed: nothing to repeat at offset 45'
Jim
Mark, any ideas as to why the second regex won't match anything? I'm clueless. The only characters I'm interested in matching in the second string are only those that I mentioned. Everything else in the string can be disregarded.
Jim
Can someone tell me what this does? `+.*? +M`. This is in Mark's second example.
Jim
@Jim: Here's a link to Rubular showing the regular expression functioning correctly there - http://rubular.com/r/9Zkkp6YfGQ . I don't know what your error you have made. It would help if you posted the exact code that gives the error.
Mark Byers
Thanks Mark. I have no idea why it isn't working over on my end. I can see that your example works on Rubular. If I'm reading your expression correctly, I think you may have an error in the beginning of your regex where you state `+.*? +M`. The `M` is not static and will vary from person to person. All I am getting is a compilation error.
Jim
@Jim: `" +"` matches one or more spaces. `".*?"` matches anything. I'd prefer to use something more precise but you haven't described the format of your string in sufficient detail.
Mark Byers
Mark, would you mind if I went back over the last string so that you can tweak it? I'm sorry to ask but I'm really lost with regex. I will however make you the promise to at least pick up a book on it so that I can learn more about how to use it. Regex is jaw dropping with regard to what can be matched.
Jim
Mark, I tested a few other strings and noticed that some are not matched. The reason for this is that the `M` in your expression stands for Male or Female. Your expression only matches males. How can I change this to match for both?
Jim
I just fixed the gender portion myself by changing your line from `+M` to `+[MF]`. I have no idea why it will not work in PHP. I'm going to credit you for the answer. You have helped me tremendously, Mark, thank you.
Jim