tags:

views:

658

answers:

2

Hey,

I need a regexp in PHP to find a the http-equiv="refresh" meta tag in a URL. What I need is the actual URL to follow. Now, as far as I know there are two valid ways to use this meta tag:

content="0; url=urlhere" http-equiv="refresh" /> and

http-equiv="refresh" content="0; url=urlhere"/>

Thanks!

A: 
http-equiv\W*refresh.+?url\W+?(.+?)\"

Try:

if (preg_match('/meta.+?http-equiv\W+?refresh'/i,$x)) {
   preg_match('/content.+?url\W+?(.+?)\"/i',$x,$matches);
   print_r($matches);
}
jmans
gives me an error... Delimiter must not be alphanumeric or backslash for preg_match
Let's see the code.
jmans
preg_match('http-equiv\W*refresh.+?url\W+?(.+?)\"', file_get_contents($x), $matches);
try: preg_match('/http-equiv\W*refresh.+?url\W+?(.+?)\"/i', file_get_contents($x), $matches);
jmans
returns empty array... the content of $x is<meta content="0; url=http://google.com" http-equiv="refresh" />
Check my edited post.
jmans
works great! thanks!
Feel free to vote for my post:-)
jmans
A: 

Dima,

Try this:

<?
  preg_match('|content="\d+;url=(.*?)"|i', '<META HTTP-EQUIV="Refresh" CONTENT="5;URL=http://www.stackoverflow.com"&gt;', $res1);
  preg_match('|content="\d+;url=(.*?)"|i', '<META CONTENT="5;URL=http://www.stackoverflow.com" HTTP-EQUIV="Refresh">', $res2);

  echo "<pre>";
  var_dump($res1);
  var_dump($res2);
  echo "</pre>";
?>

Output:

array(2) {
  [0]=>
  string(44) "CONTENT="5;URL=http://www.stackoverflow.com""
  [1]=>
  string(28) "http://www.stackoverflow.com"
}
array(2) {
  [0]=>
  string(44) "CONTENT="5;URL=http://www.stackoverflow.com""
  [1]=>
  string(28) "http://www.stackoverflow.com"
}

Bear in mind that you'll have to deal with white spaces (inside content attribute, between tags, inside http-equiv attribute, etc.), such as:

<META HTTP-EQUIV="Refresh" CONTENT=" 5 ; URL=http://www.stackoverflow.com ">

The following code snippet handles that case:

<?
  preg_match('|content="\s*\d+\s*;\s*url=(.*?)\s*"|i', '<META HTTP-EQUIV="Refresh" CONTENT=" 5 ; URL=http://www.stackoverflow.com ">', $res3);

  echo "<pre>";
  var_dump($res3);
  echo "</pre>";
?>

Output:

array(2) {
  [0]=>
  string(48) "CONTENT=" 5 ; URL=http://www.stackoverflow.com ""
  [1]=>
  string(28) "http://www.stackoverflow.com"
}

Lastly, if that isn't enough, you can check for http-equiv="refresh" on each side of the content attribute (always takin into account the white spaces) like this:

<?
  preg_match('|(?:http-equiv="refresh".*?)?content="\d+;url=(.*?)"(?:.*?http-equiv="refresh")?|i', '<META HTTP-EQUIV="Refresh" CONTENT="5;URL=http://www.stackoverflow.com"&gt;', $res4);
  preg_match('|(?:http-equiv="refresh".*?)?content="\d+;url=(.*?)"(?:.*?http-equiv="refresh")?|i', '<META CONTENT="5;URL=http://www.stackoverflow.com" HTTP-EQUIV="Refresh">', $res5);  


  echo "<pre>";
  var_dump($res4);
  var_dump($res5);
  echo "</pre>";
?>

Output:

array(2) {
  [0]=>
  string(44) "CONTENT="5;URL=http://www.stackoverflow.com""
  [1]=>
  string(32) "http://www.stackoverflow.com"
}
array(2) {
  [0]=>
  string(65) "CONTENT="5;URL=http://www.stackoverflow.com" HTTP-EQUIV="Refresh""
  [1]=>
  string(32) "http://www.stackoverflow.com"
}

You could, using the same approach. add support for taking into account the parts.
Also, remember always to run regexes with i option, to enable case insensitive match.