views:

30

answers:

2

This may be a lame question but I am a total novice with regular expressions. I have some text data in the format:

Company Name: Name of the company, place.
Company Address: Some, address, here.
Link: http://www.somelink.com

Now, I want to use a regex to split these into an array of name : value pairs. The regular expression I am trying is /(.*):(.*)/ with preg_match_all() and it does work well with the first two lines but on the third line it returns "Link: http:" in one part and "//www.somelink.com" in other.

So, is there any way to split the line only at the first occurrence of the character ':'?

A: 

You probably want something like /(.*?):(.*)/. The ? after the * will make it "non-greedy", so it will consume as little text as possible that way. I think that will work for your situation. By default, * is "greedy", and tries to match as many repetitions as it can.

Edit: See here for more about matching repetition using the * and + operators.

eldarerathis
cool thanks man for a prompt reply and one that works too :)
aadravid
+1  A: 

Use negated character class (see on rubular.com):

/^([^:]*):(.*)$/m

The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.

The ^ and $ at the beginning and end of the pattern are the beginning and end of the line anchors. The m modifiers turns on the multi-line mode.

The problem with your original pattern is that you're (ab)using . when you could've been a lot more specific, and since * is greedy, the first group overmatched. It's tempting to try to "fix" that by making the repetition reluctant, but it's MUCH better to be more specific and say that the first group is matching anything but :.

Note however that this is a matching pattern, with captures. It's not actually a splitting pattern that matches only the delimiter. The delimiter pattern really is just :.

Related questions


PHP snippet

Given this:

$text = <<<EOT
Company Name: Name of the company, place.
Company Address: Some, address, here.
Link: http://www.somelink.com
EOT;

preg_match_all('/^([^:]*):(.*)$/m', $text, $matches, PREG_SET_ORDER);

print_r($matches);

The output is (as seen on ideone.com):

Array
(
    [0] => Array
        (
            [0] => Company Name: Name of the company, place.
            [1] => Company Name
            [2] =>  Name of the company, place.
        )

    [1] => Array
        (
            [0] => Company Address: Some, address, here.
            [1] => Company Address
            [2] =>  Some, address, here.
        )

    [2] => Array
        (
            [0] => Link: http://www.somelink.com
            [1] => Link
            [2] =>  http://www.somelink.com
        )

)
polygenelubricants
thanks for a quick reply... it works but gives me a lot of empty characters before the string, the previous one working fine.
aadravid
@aadravid: Check out latest revision. I think I finally figured out your exact requirements.
polygenelubricants
+1 Having had a chance to look at this again, I actually agree that this latest revision is probably a better/safer regexp. I just didn't think to do this at the time.
eldarerathis
thanks polygenelubricants, that really seems better than the first one. but i have already implemented eldarerathis solution. but will keep yours in mind in the future :)
aadravid