views:

63

answers:

3

I have a list of urls, which can come in any format. One per line, separated by commas, have random text in between them, etc. the URLs are all from 2 different sites, and have a similar structure

For this example, lets say it looks like this

Random Text - http://www.domain2.com/variable-value
Random Text 2 - http://www.domain1.com/variable-value, http://www.domain1.com/variable-value, http://www.domain1.com/variable-value

http://www.domain1.com/variable-value
http://www.domain2.com/variable-value
http://www.domain1.com/variable-value http://www.domain2.com/variable-value http://www.domain1.com/variable-value

I need to extract 2 pieces of information. Check to see if its domain1 or domain2 and the value that follows "variable-"

So it should create a multi-dimensional array, which would have 2 items: domain + value.

Whats the best way of doing that?

+1  A: 

This is a possiblity of extracting the urls. The only problem is that the urls itself may not contain a comma. So if is enough....

$lines = explode('\n', $urls);

for($i = 0; $i < sizeof($lines); $i++)
{
    if(preg_match_all("http:\\/\\/[^,]*variable-([^,]+)", $lines[$i], $matches))
    {

    }
}

By the way... matches are stored in the $matches array.

P.S: Edited... i forgot to escape the backslash and you should search the string line for line to ensure a correct behaviour... test the regex at http://www.regex-tester.de/regex.html... it just worked out with my regex.

P.P.S: After further researches i found this page: http://internet.ls-la.net/folklore/url-regexpr.html. It contains the regular expression for a url. You could use it to extract the urls first and in the second step you could go through your urls and extract the variable information looking for e.g. variable-([\W]+).

Simon
This doesnt match anything :(
Jack
problem is, it wont always be 1 link per line.
Jack
A: 

preg_split, preg_match, parse_url

// split urls
$urls = preg_split('!,\s+!', 'http://www.domain1.com/variable-value, http://www.domain2.com/variable-value, http://www.domain3.com/variable-value');

// check for domain and path variable
foreach ($urls as $url) {

    $parts = parse_url($url);
    // check domain: $parts['host'];
    $matches = array();
    // check path: preg_match('!^/variable-([^/]+)!', $parts['path'], $matches)
}
webbiedave
A: 
$text = "http://www.domain1.com/variable-value1, http://www.domain2.com/variable-value2 http://www.domain1.com/variable-value3";
preg_match_all("/http:\\/\\/(.+?)\\/variable-([a-z0-9]+)/si", $text, $matches);
print_r($matches);

Result:

Array
(
    [0] => Array
        (
            [0] => http://www.domain1.com/variable-value1
            [1] => http://www.domain2.com/variable-value2
            [2] => http://www.domain1.com/variable-value3
        )

    [1] => Array
        (
            [0] => www.domain1.com
            [1] => www.domain2.com
            [2] => www.domain1.com
        )

    [2] => Array
        (
            [0] => value1
            [1] => value2
            [2] => value3
        )

)
serg