ansaurus

Question

Answer 1

A:

The main issue I have is that I cannot see the html, so I cannot be sure what the problem is. Having said that, two things occur to me.

The first thing to check is if the images are relative or not. If they are displayed in the form ../xyz/foo.jpg or foo.jpg then you will either need to edit the images src to the full url or add the base tag to the html

For parsing HTML, use the Simple HTML DOM library as it is faster than rolling your own.

The second issue may be that the images also require the user to be logged in. If this is the case you would also have to download all the images, and either embed them in the content after base 64 encoding them, or store them temporally on your server.

Yacoby 2009-12-10 14:38:10

Hi Yacoby, I can send the HTML to u. can u pls provide me ur email address? Regards,Qing

QLiu 2009-12-11 13:43:02

Answer 2

+2 A:

cURL doesn't get images or any other 'content', it just gets the raw HTML page. Are you saying you are missing <img /> tags that are present on the original page?

cURL also doesn't parse any CSS or JavaScript, so if the content is modified with those, it won't come through. For example, you may be unable to get a background-image of an element unless you do more scraping, that is, get the associated CSS file and parse that.

Tatu Ulmanen 2009-12-10 14:39:35

Also, it's not nearly as elegant, but you could also shell out to `wget` which could grab the images and everything for you. curl, from code, is going to me more reliable though.

hometoast 2009-12-10 14:46:49

Hi,Do you have an example to do shell out wget in my situation? Grab all the contents

QLiu 2009-12-11 13:53:54

Answer 3

A:

Here the some html codes: The images that I want to get:

<img id="backgroundImage" style="z-index: 0;" src="/nagios/nagvis/nagvis/images/maps/Nagvis_CC.png"/>

<a href="/nagios/cgi-bin/extinfo.cgi?type=2&host=business_processes&service=NLThirdPartyLive" target="_self">

And a lot of javascript.

I tried to use simple HTML dom libray, but the output is array. nothing

require("/simplehtmldom/simple_html_dom.php");

$ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, 'WhateverBrowser1.45'); curl_setopt($ch, CURLOPT_URL, 'http://10.123.22.38/nagios/nagvis/nagvis/index.php?map=Nagvis%5FCC'); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC); //Normal HTTP request, not SSL curl_setopt($ch, CURLOPT_USERPWD, "guest:test" ); // Pass the user name and password curl_setopt ($ch, CURLOPT_TIMEOUT, 60); $result = curl_exec($ch);

$html= str_get_html($result); echo $ret= $html->find('table[class=header_table]');

echo $result;

QLiu 2009-12-11 15:03:43

ansaurus

tags:

views:

answers:

Curl grab HTML content

related questions