tags:

views:

99

answers:

4

I need a regular expression that can capture the data from a description like this:

14Kt Yellow Gold Mothers Ring Style 152, Genuine Amethyst,Genuine Diamond,Simulated Emerald,Premium Topaz,Premium Tourmaline,Genuine Sapphire, Engravings: jim,jake,john,jeff,rob,sandy, Band Engraving: smith

What I need to capture is:

A) style (Style 152) (style + any number)

B) gold (14Kt Yellow Gold) (can be combinations of 14kt, 10kt, yellow, or white)

C) stones (Genuine Amethyst,Genuine Diamond,Simulated Emerald,Premium Topaz,Premium Tourmaline,Genuine Sapphire) (this can change in how many but always at least 1)

D) engravings (jim,jake,john,jeff,rob,sandy) (this can be 0 or more and the string Engravings: wont be there if there is no names)

E) band engraving (smith) (this is also optional and the string Band Engraving: wont be there if there is no name either)

I have been working with regular epressions for a few months now but this is a little over my head since it can very so much...this is the best one I came up woth but it doesn't ork if the string Engravings: is gone:

/(\d{2}.+gold).+(style \d+)(.+)engravings:([^\*]*)(\*)?(.*)/i

THANKS!

+1  A: 

Why not simply break it up into multiple regexes? That way you could check to see if "engravings" is included in the string, and then either populate the engravings value, or else leave it blank.

RossFabricant
A: 

It's probably better to break it up until multiple regexes for each section, but you can make a chunk like engravings optional by wrapping it with parentheses and adding a ? after, (like this)?

Mark
A: 
/(\d{2}.+gold).+(style \d+)(.+?)(engravings:.*?)?(band engraving:.*)?/i

may do what you want

cobbal
This doesn't work. Try it.
Jeremy Stein
A: 

... can be combinations of 14kt, 10kt, yellow, or white ...

I really don't think a regex is what you want here. It's not always appropriate.

If the order of the data might vary between descriptions (e.g. sometimes style comes before gold, sometimes after), then that's a very good indicator that you want more general parsing (possibly using multiple regexes as suggested by rossfabricant).

If you know the order is 100% consistent, then you can probably construct a single regex to do it, but I think it would be more effort than it's worth. There are probably better options -- it would help to know what language you're using.

Zac Thompson
I am using PHP, but I think this is going to be ported over to a co-workers system which is VB. Using multiple regex's is fine, I just want an elegant solution that will always work.
John Isaacks