views:

126

answers:

2

Hello fellow developers,

I tried searching for the answer to this, but I couldn't find anything too helpful in this situation. It's possible that I'm not searching the correct terms.

I'm having trouble with this regex. Consider this string:

$str = "(1, 2, 'test (foo) bar'), (3, 4, '(hello,world)')";

I want to end up with a multidimensional array, like this:

$arr = array(
   array(1, 2, 'test (foo) bar'),
   array(3, 4, '(hello,world)')
);

I figure I could run a regex to split it up separate strings like "(1, 2, 'test (foo) bar')" and "(3, 4, '(hello,world)')", and then run a regex on each of those to split by comma, but as you can see my problem is the data has parentheses and commas in various strings, and I'd like to ignore those.

So far I have this, which does the first part like I wanted, except if there are parentheses in the data, then it breaks.

preg_match_all('/\((.*?)\),?/', $str, $matches);

It gives me this:

Array
(
    [0] => Array
        (
            [0] => (1, 2, 'test (foo)
            [1] => (3, 4, '(hello,world)
        )

    [1] => Array
        (
            [0] => 1, 2, 'test (foo
            [1] => 3, 4, '(hello,world
        )

)

It truncates the data, naturally. What can I do to ignore the parentheses that are in quotes? If I can ignore them, then on the next step when I split each of these matches, I'll be able to ignore commas.

Thanks!

A: 

In general, you cannot do that with regexes. But in this case you can try this expression:

\(([^']*?'.*?')\),?
Max
this works when the data has strings, but it fails on strings like:
Adam Jackett
(1, 1), (5, 0), (10, 0), (14, 1), (15, 0), (20, 1), (25, 0), (29, 1), (39, 0)
Adam Jackett
Try this one \\(([^']*?'.*?')\\)|\\(([^']*?)\\),?
Max
This is a lot closer, but it's giving me an issue in some cases - I'm trying to figure out what's causing that first. Thanks.
Adam Jackett
Do you have any 'nested' strings, like 'blah 'more blah' blah-blah'?
Max
That could be it. I'm putting together an example of what's causing the trouble.
Adam Jackett
$str = "(4, 'Coupons', '', 3, 3), (5, 'Customer Loyalty Program', 'this (is) broken', 3, 1)";just gives me this:Array( [0] => Array ( [0] => (is) ) [1] => Array ( [0] => ) [2] => Array ( [0] => is ))
Adam Jackett
Then I think it's not possible to do this only with regexes.
Max
Aww... that's too bad. It was so close too. Thanks anyways.
Adam Jackett
A: 

([0-9]+), (\'([A-Za-z0-9(), ]+)\')?

This appears to do what you want.

$matches Array:
(
[0] => Array
    (
        [0] => 1, 
        [1] => 2, 'test (foo) bar'
        [2] => 3, 
        [3] => 4, '(hello,world)'
    )

[1] => Array
    (
        [0] => 1
        [1] => 2
        [2] => 3
        [3] => 4
    )

[2] => Array
    (
        [0] => 
        [1] => 'test (foo) bar'
        [2] => 
        [3] => '(hello,world)'
    )

[3] => Array
    (
        [0] => 
        [1] => test (foo) bar
        [2] => 
        [3] => (hello,world)
    )
)

Is this closer?

0x90
oh! you need the 1, 2 and 3, 4 too. sorry. one moment.
0x90
[A-Za-z0-9(), ] is better to write as [^']
Max
well, I will also have non-alphanumeric characters, so I can't use ranges.
Adam Jackett
This splits things up way too much. Not what I'm looking for. Thanks though.
Adam Jackett