tags:

views:

356

answers:

5

I am using the following code:

<?php
$stock = $_GET[s]; //returns stock ticker symbol eg GOOG or YHOO
$first = $stock[0];

$url = "http://biz.yahoo.com/research/earncal/".$first."/".$stock.".html";
$data = file_get_contents($url);

$r_header = '/Prev. Week(.+?)Next Week/';
$r_date = '/\<b\>(.+?)\<\/b\>/';

preg_match($r_header,$data,$header);
preg_match($r_date, $header[1], $date);

echo $date[1];
?>

I've checked the regular expressions here and they appear to be valid. If I check just $url or $data they come out correctly and if I print $data and check the source the code that I'm looking for to use in the regex is in there. If you're interested in checking anything, an example of a proper URL would be http://biz.yahoo.com/research/earncal/g/goog.html

I've tried everything I could think of, including both var_dump($header) and var_dump($date), both of which return empty arrays.

I have been able to create other regular expressions that works. For instance, the following correctly returns "Earnings":

$r_header = '/Company (.+?) Calendar/';
preg_match($r_header,$data,$header);
echo $header[1];

I am going nuts trying to figure out why this isn't working. Any help would be awesome. Thanks.

A: 

I think this is because you're applying the values to the regex as if it's plain text. However, it's HTML. For example, your regex should be modified to parse:

<a href="...">Prev. Week</a> ...

Not to parse regular plain text like: "Prev. Week ...."

mnour
+3  A: 

Your regex doesn't allow for the line breaks in the HTML Try:

$r_header='/Prev. Week((?s:.*))Next Week/';

The s tells it to match the newline characters in the '.' (match any).

Boofus McGoofus
+1  A: 
  1. Dot does not match newlines by default. Use /your-regex/s
  2. $r_header should probably be /Prev\. Week(.+?)Next Week/s
  3. FYI: You don't need to escape < and > in a regex.
eyelidlessness
+2  A: 

Problem is that the HTML has newlines in it, which you need to incorporate with the s regex modifier, as below

<?php
$stock = "goog";//$_GET[s]; //returns stock ticker symbol eg GOOG or YHOO
$first = $stock[0];

$url = "http://biz.yahoo.com/research/earncal/".$first."/".$stock.".html";
$data = file_get_contents($url);

$r_header = '/Prev. Week(.+?)Next Week/s';
$r_date = '/\<b\>(.+?)\<\/b\>/s';


preg_match($r_header,$data,$header);
preg_match($r_date, $header[1], $date);

var_dump($header);
?>
Vinko Vrsalovic
+1  A: 

You want to add the s (PCRE_DOTALL) option: by default, . doesn't match newline, and I see the page has them between the two parts you look for.

Side note: although they don't hurt (except readability), you don't need backslash before < and >.

PhiLho