views:

72

answers:

1

I need to extract data from a source that presents it in one of two ways. The data could be formatted like this:

Francis (Lab) 18,077 (60.05%); Waller (LD) 4,140 (13.75%); Evans (PC) 3,545 (11.78%); Rees-Mogg (C) 3,064 (10.18%); Wright (Veritas) 768 (2.55%); La Vey (Green) 510 (1.69%)

Or like this:

Lab 8,994 (33.00%); C 7,924 (29.07%); LD 5,197 (19.07%); PC 3,818 (14.01%); Others 517 (1.90%); Green 512 (1.88%); UKIP 296 (1.09%)

The data I need to extract is the percentage and the party (these are election results), which is either in brackets (first example) or is the only non-numeric text.

So far I have this:

preg_match('/(.*)\(([^)]*)%\)/', $value, $match);

Which is giving me the following matches (for first example):

Array
(
    [0] => Francis (Lab) 18,077 (60.05%)
    [1] => Francis (Lab) 18,077 
    [2] => 60.05
)

So I have the percentage, but I also need the party label, which may or may not be in brackets and may or may not be the only text. Can anyone help?

+1  A: 

Do party symbols ever have whitespace in them? If not, this should do the trick:

'/\(?([A-Za-z]+)\)?\s*[\d,]+\s*\(([\d.]+%)\)/'

The regex is anchored by the raw number and the percentage; the party is just the last non-whitespace sequence preceding them, and may or may not be enclosed in brackets.

Alan Moore
+1. Same thing I was doing, but I'd use `\s+` :)
Qtax
Thanks so much. works perfectly.
martinpetts