tags:

views:

57

answers:

3

Please HELP! :(

I am looking to develop a PHP Script to do the following:

  • Scrap a remote HTML page and extract selected data (e.g. particular table/div)
  • Use extracted data and save it into a Database (e.g. MySql)

Anyone can help out?

Thanks and appreciate your soonest feedback.

+5  A: 

Use cUrl to retreive page.

Use Simple HTML DOM Parser to find data you need.

Eventually iconv to convert fetched data to your database character set.

And just mysql connection and simple queries to store data (don't forget to escape).

killer_PL
lol you were faster in replying :d
krike
A: 

What a coincidence, recently i have worked on similar project. My final solution was

  1. cUrl to fetch contents from urls
  2. Simple HTML DOM Parser to get required required part of html using jquery like selectors.

I strongly recommend both of them.

Imran Naqvi
+2  A: 

Here's some code that does the job:

// Fetch page
$file = fopen($url, "r"); 

$data = '';
while (!feof($file)) {
// Extract the data from the file / url
$data .= fgets($file, 1024);
}

$doc = new DOMDocument();

$doc->loadHtml($data);

// XPath lets you search DOM documents easily
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//table[class=mytable]');

This will fetch a node list, which you can iterate over, for any tables with the class 'mytable'

Take a look at DomDocument and XPath.

Kevin Sedgley