tags:

views:

95

answers:

4

I need to extract the first URL from some content. The content may be like this

({items:[{url:"http://cincinnati.ebayclassifieds.com/",name:"Cincinnati"},{url:"http://dayton.ebayclassifieds.com/",name:"Dayton"}],error:null}); 
or may contain only a link

({items:[{url:"http://portlandor.ebayclassifieds.com/",name:"Portland (OR)"}],error:null}); 

currently I have :

  $pattern = "/\:\[\{url\:\"(.*)\"\,name/";
        preg_match_all($pattern, $htmlContent, $matches);
         $URL = $matches[1][0];

however it works only if there is a single link so I need a regex which should work for the both cases.

A: 

That smells like JSON to me. Try using http://php.net/json_decode

robertbasic
it's not valid JSON so I would prefer a regex than to correct the json and decode it...too much hassle.
Michael
can you help me with regex ? :|
Michael
A: 

Looks like JSON to me, visit http://php.net/manual/en/book.json.php and use json_decode().

Mikulas Dite
it's not valid JSON so I would prefer a regex than to correct the json and decode it...too much hassle.
Michael
Do you not have any control over the generated pseudo-JSON?
Jon Cram
@Jon Cram I don't have control over the content generated ..
Michael
A: 

Hopefully this should work for you

<?php
$str = '({items:[{url:"http://cincinnati.ebayclassifieds.com/",name:"Cincinnati"},{url:"http://dayton.ebayclassifieds.com/",name:"Dayton"}],error:null});'; //The string you want to extract the 1st URL from

$match = ""; //Define the match variable
preg_match("%(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&amp;\%\$#\=~_\-]+))*%",$str,$match); //I Googled for the best Regular expression for URLs and found the one included in the preg_match

echo $match[0]; //Return the first item in the array (the first URL returned)
?>

This is the website that I found the regular expression on: http://regexlib.com/Search.aspx?k=URL

like the others have said, json_decode should work for you aswell

Chief17
A: 

You can use this REGEX:

$pattern = "/url\:\"([^\"]+)\"/";

Worked for me :)

Hisamu
:) it works here too
Michael