tags:

views:

45

answers:

3

I need a regexp to match something like this,

'text' | 'text' | ... | 'text'(~text) = 'text' | 'text' | ... | 'text'

I just want to divide it up into two sections, the part on the left of the equals sign and the part on the right. Any of the 'text' entries can have "=" between the ' characters though. I was thinking of trying to match an even number of 's followed by a =, but I'm not sure how to match an even number of something.. Also note I don't know how many entries on either side there could be. A couple examples,

'51NL9637X33' | 'ISL6262ACRZ-T' | 'QFN'(~51NL9637X33) = '51NL9637X33' | 'ISL6262ACRZ-T' | 'INTERSIL' | 'QFN7SQ-HT1_P49' | '()'

Should extract, '51NL9637X33' | 'ISL6262ACRZ-T' | 'QFN'(~51NL9637X33) and, '51NL9637X33' | 'ISL6262ACRZ-T' | 'INTERSIL' | 'QFN7SQ-HT1_P49' | '()'

'227637' | 'SMTU2032_1' | 'SKT W/BAT'(~227637) = '227637' | 'SMTU2032_1' | 'RENATA' | 'SKT28_5X16_1-HT5_4_P2' | '()' :SPECIAL_A ='BAT_CR2032', PART_NUM_A='202649'

Should extract, '227637' | 'SMTU2032_1' | 'SKT W/BAT'(~227637) and, '227637' | 'SMTU2032_1' | 'RENATA' | 'SKT28_5X16_1-HT5_4_P2' | '()' :SPECIAL_A ='BAT_CR2032', PART_NUM_A='202649'

Also note the little tilda bit at the end of the first section is optional, so I can't just look for that.

+4  A: 

Actually I wouldn't use a regex for that at all. Assuming your language has a split operation, I'd first split on the | character to get a list of:

'51NL9637X33'
'ISL6262ACRZ-T'
'QFN'(~51NL9637X33) = '51NL9637X33'
'ISL6262ACRZ-T'
'INTERSIL'
'QFN7SQ-HT1_P49'
'()'

Then I'd split each of them on the = character to get the key and (optional) value:

'51NL9637X33'           <null>
'ISL6262ACRZ-T'         <null>
'QFN'(~51NL9637X33)     '51NL9637X33'
'ISL6262ACRZ-T'         <null>
'INTERSIL'              <null>
'QFN7SQ-HT1_P49'        <null>
'()'                    <null>

You haven't specified why you think a regex is the right tool for the job but most modern languages also have a split capability and regexes aren't necessarily the answer to every requirement.

paxdiablo
A: 

I agree with paxdiablo in that regular expressions might not be the most suitable tool for this task, depending on the language you are working with.

The question "How do I match an even number of characters?" is interesting nonetheless, and here is how I'd do it in your case:

(?:'[^']*'|[^=])*(?==)

This expression matches the left part of your entry by looking for a ' at its current position. If it finds one, it runs forward to the next ' and thereby only matching an even number of quotes. If it does not find a ' it matches anything that is not an equal sign and then assures that an equal sign follows the matched string. It works because the regex engine evaluates OR constructs from left to right.

You could get the left and right parts in two capturing groups by using

((?:'[^']*'|[^=])*)=(.*)

I recommend http://gskinner.com/RegExr/ for tinkering with regular expressions. =)

Jens
A: 
Antal S-Z