views:

328

answers:

3

How can I use PHP to include an external webpage? (sort of like the wordpress theme preview.)

I want (X)HTML STRICT compliant code - no iFrame and preferably no javascript.

The idea is that I am making a sandbox for clients to view webpages in my controlled environment. The other thing is that the webpages being included should not be visible without the "sandbox" wrapper".

EDIT:

According to some commentators, GoDaddy has cUrl. The next part of the question becomes - how do I strip out the headers and footers of the html in php so that just the contents of the body tag remain? I would rather use php string functions than regex.

+3  A: 

Try Using Curl:

/**
 * Get a web file (HTML, XHTML, XML, image, etc.) from a URL.  Return an
 * array containing the HTTP server response header fields and content.
 */
function get_web_page( $url )
{
    $options = array(
        CURLOPT_RETURNTRANSFER => true,     // return web page
        CURLOPT_HEADER         => false,    // don't return headers
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects
        CURLOPT_ENCODING       => "",       // handle all encodings
        CURLOPT_USERAGENT      => "spider", // who am i
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
        CURLOPT_TIMEOUT        => 120,      // timeout on response
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}

Just call that function as-is with your url and it should echo out the whole webpage into the php page.

However, you may need to rewrite links to assets, such as stylesheets and images using some regex. (Replace "/image.jpg" with "http://mydomain.com/image.jpg").

Curl usually is installed on shared hosts.

If you want to just get the page's body, or head, you can use simplexml or regex expressions for that. (If the html is well-formed, simplexml is great for traversing the DOM).

CodeJoust
How do I set up curl? I'm on a shared host.
Moshe
Also, will this import the head tag of the page and whatnot? - I would hope not because then I have to strip it out... more work...
Moshe
Ask to your hoster if cUrl is installed, or if they can install it.
DaNieL
Host is godaddy.
Moshe
Curl should be already installed on godaddy.
CodeJoust
The HTTP Headers includes the contents of the "head" tag or not?
Moshe
+2  A: 

PHP's file_get_contents command works across domains, so you're able to retrieve external mark-up. However, just outputting this has multiple issues, including relative links not working, as well as cross-site scripting vulnerabilities.

While you said you don't want to use an iframe, the tag is valid XHTML 1.0 Transitional, and just based on your description is what I would recommend for compatibility and security reasons.

Zurahn
sorry - I meant XHTML strict.
Moshe
+1, if any kind of "sandboxing" is required, then an iframe is the most sensible solution.
DisgruntledGoat
Also, all of the code is directly in my domain - less security issues - I will be filtering the page url to relative first before calling it.
Moshe
A: 

alright what u can do is that u can use this then would have to define the domain name

function __test($results){
    $pattern = '/http:\/\/.+\.(jpeg|jpg|gif)/'; //regex pattern defines the image :D
    preg_match_all($pattern, $results, $array); //responce of array

    foreach ($array[0] as $images)  //add everything as one 
    {
        $results_image = $images;
        $url = "http://saxtorinc.com/$results_image";
    }
    return $url;          
}
Saxtor
I'm not sure what exactly this code does. It's pulling images or HTML files?
Moshe