tags:

views:

623

answers:

3

Hi all,

I'm trying to use cURL to grab an external web page to put into my own website, it's basically a "ladder" of a sports team, I contacted them, but they do not have a RSS feed of the ladder, so I'm trying to obtain the ladder by other means, is it possible to grab everything between < table > and < / table > using cURL? I can grab the page that I want using the following code, but I don't need anything else except for the HTML table.

$ch = curl_init ("http://www.sportingpulse.com/rpt_ladder.cgi?results=N&amp;round=15&amp;client=1-3909-47801-81021-6151461&amp;pool=-1");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
echo $page = curl_exec($ch);

If someone could help me out, that'd be great. Thanks

Leanne

+2  A: 

You'll need to use curl to grab the contents of the page and string processing to extract the table from the returned string.

A simple regex to start would be:

/<table>(.*)<\/table/s

So if you take your example above, you'd do something like:

$page = curl_exec($ch);

if (preg_match("/<table>(.*)<\/table/s", $page, $matches)) {
    echo $matches[1];
}

This code will match the first table on the page. You'd need to tweak it to match exactly the HTML you want to extract.

Rafe
Thanks for that, but it's not displaying anything though..
SoulieBaby
Would it make a difference if there's a class attached to the table?
SoulieBaby
@SoulieBaby: yes it would.
Stephen C
Ahh ok, the class on the < table > is "resulttable" can I still use the above code (but somehow modified?)
SoulieBaby
+1  A: 

Ok, so I managed to get it working using this (if anyone else wants to know)

$ch = curl_init ("http://www.sportingpulse.com/rpt_ladder.cgi?results=N&amp;round=15&amp;client=1-3909-47801-81021-6151461&amp;pool=-1");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($ch);

preg_match('#<table[^>]*>(.+?)</table>#is', $page, $matches);
foreach ($matches as &$match) {
    $match = $match;
}
echo '<table>';
    echo $matches[1];
echo '</table>';

:)

SoulieBaby
+1  A: 

An alternative option to pure regex would be to use DOMDocument and xPath. This turns the entire document into an object and makes working with the contents of the table easier

Mark