views:

77

answers:

4

Hi all, Having a hard time with this one as I don't think I know all of my options.

I have to parse a free form text field that I need to map the values to a database.

Here is some example text, NOTE: not all fields have to be there, not all delimiters are the same and not all descriptors are available. I do need to check if the value is numeric only or is it alpha numeric.

Example 1

field1: 999-999234-24-2 

field2 Description: a short description 

field3: 3.222.1 

asdfg 

field number four: NO 

field5:

Example 2

field1: 999-999234-24-2/field2 Description: a short description/field3: 3.222.1 asdfg/field number four: NO/field5:

Example 3

999-999234-24-2 

Example 4

field1: 999-999234-24-2 field2 Description: a short description field3: 3.222.1 asdfg field number four: NO field5:

Example 5

field1: 999-999234-24-2 - field2 Description: a short description - field3: 3.222.1 asdfg - field number four: NO - field5: 

What I would like is all fields X to be in there own column. NOTE the example data is all in the same order but live data is not.

Now I don't mind doing this in steps if I need to but having a hard time just parsing the values up into columns. any suggestions?

I was thinking some sort of case function with a RegEx but not luck so far.

A: 

Maybe you should standardize on the java .properties format then you can use this PHP example to parse it:

http://www.innerweaver.com/?p=13

mythz
interesting but I don't know how to apply this to my question, could you give me an example?
Phill Pafford
A: 

Since it's still stuck in my head ... the way I'd go about it is start handling each of these cases and see if there is any remaining tweaks/fallout. What appears to make this tricky is the only reliable deliminator is 'field', and if anyone uses that in a description it'll break. I'd just have to take the file and start iterating.

Splitting it with this regex would at least be a good start point for dividing the headers and the data. Basically, field plus additional optional text that covers the possibility of 'Description' and 'number four' added before the closing :

field[^:]{0,12}:

After that, you'd at least have to strip trailing / for case #2, the ' - ' for case #5, the extra linebreaks if you don't want them in the data for case #1.

Ben
A: 

RegEXP would be hard to maintain in some edge-cases. Try writing a simple finite state machine

Ziells
A: 

after much though/trial and error I'm going to read them into an array and parse out each line of text. It's long and going to be a mess but should get the job done.

Phill Pafford