tags:

views:

255

answers:

5

The src of the <img> is known,as well as the url of the target webpage,like http://test.com/directory/hi.html,how to implement the general function to retrieve the absolute url of the image?

A: 

This thread might help you:

http://stackoverflow.com/questions/138313/how-to-extract-img-src-title-and-alt-from-html-using-php

megatr0n
Not really related at all.
musicfreak
+1  A: 

If I understand the question well you can define the APPLICATION_PATH in the application root as below and use it afterwards:

define('APPLICATION_PATH', realpath(dirname(__FILE__)));
...
<img src="<?php APPLICATION_PATH ?>/path/to/image.png" alt="" />
...
Sepehr Lajevardi
That's assuming it's a local image. I think the OP wants a URL.
musicfreak
This is not what I want...
need more clarifications, absolute url?! do you mean base url?! an example may help!
Sepehr Lajevardi
The OP has the URL of the HTML file, and the **relative** URL of the image to that HTML file. He needs to turn this into an **absolute** URL of the image.
musicfreak
A: 
function getAbsoluteUrl($url, $file) {
    if (strpos($url, '/') === 0) {
        $info = parse_url($url);
        return $info["host"];
    }
    $parts = explode('/', $url);
    array_pop($parts);

    return implode('/', $parts) + $file;
}

Can't guarantee that it works with all URLs, but I can't think of a counterexample.

musicfreak
I don't think it will work with `http://test.com/directory/hi.html` `/somepic.jpg`
Ah, you're right. You'd probably have to make a special case for that. There's gotta be a library that takes care of this stuff for you...
musicfreak
Edited to include your example. Let me know if there is another case I forgot.
musicfreak
+1  A: 

I would:

  1. Check if the img URL started with a host value, and if so, return it, as it's already absolute
  2. Parse the web page URL to get the base host value
  3. Check if the img src is relative to the root or the directory, and append to the base host value as appropriate.

parse_url() is going to be the function of choice here. I was feeling brave, so I implemented it for you:

function getAbsoluteImageUrl($pageUrl,$imgSrc)
{
    $imgInfo = parse_url($imgSrc);
    if (! empty($imgInfo['host'])) {
        //img src is already an absolute URL
        return $imgSrc;
    }
    else {
        $urlInfo = parse_url($pageUrl);
        $base = $urlInfo['scheme'].'//'.$urlInfo['host'];
        if (substr($imgSrc,0,1) == '/') {
            //img src is relative from the root URL
            return $base . $imgSrc;
        }
        else {
            //img src is relative from the current directory
               return 
                    $base
                    . substr($urlInfo['path'],0,strrpos($urlInfo['path'],'/'))
                    . '/' . $imgSrc;
        }
    }
}

//tests

$host = 'http://test.com/directory/hi.html';
$imgSrc = '/images/lolcat.jpg';
echo getAbsoluteImageUrl($host,$imgSrc);
//echos  http//test.com/images/lolcat.jpg 

$host = 'http://test.com/directory/hi.html';
$imgSrc = 'images/lolcat.jpg';
echo getAbsoluteImageUrl($host,$imgSrc);
//echos  http//test.com/directory/images/lolcat.jpg

$host = 'http://test.com/directory/hi.html';
$imgSrc = 'http://images.com/lolcat.jpg';
echo getAbsoluteImageUrl($host,$imgSrc);
//echos  http://images.com/lolcat.jpg
zombat
A: 

Using parse_url, you can do something like this:

  1. Parse the page url into its parts (scheme, domain, path, file, query, fragment)
  2. Rebuild the url for the img url, depending on how the img url is.
    • Img url begins with '/', it's an absolute url, so we build the img url like "$scheme://$domain$img_url"
    • Img url doesn't begin with '/', so it's relative, so we build the img url like "$scheme://$domain/$path/$img_url". This also works if img_url is ".././"-style, because those will be parsed by the server.

You may also want to exclude img urls that start with a scheme or domain ('xx://' or 'xx.xx/').

Tor Valamo