ansaurus

Question

Anyone have a good solution for scraping the HTML source of a page with content (in this case, HTML tables) generated with Javascript?

Answer 1

+2 A:

Well, Java provides some convenient solutions, like HtmlUint, which interprets correctly Javascript, and as a consequence should allow the generated html to be visible.

Riduidel 2010-05-18 09:02:48

Answer 2

+1 A:

This is a more robust version of the example in the OP using cURL/Crowbar:

<?php
function get_html($url)
{
$curl = curl_init();
curl_setopt ($curl, CURLOPT_URL, 'http://127.0.0.1:10000/?url=' . $url . '&delay=3000&view=as-is');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec ($curl);
return $html;
}
?>

Was getting frequent "failed to open stream: HTTP request failed!" errors using f_g_c with multiple URLs.

Also, remember to urlencode the $url (e.g. 'http%3A%2F%2Fwww.google.com' > 'http://www.google.com').

phpwns 2010-05-20 17:56:39

ansaurus

tags:

views:

answers:

Anyone have a good solution for scraping the HTML source of a page with content (in this case, HTML tables) generated with Javascript?

related questions