tags:

views:

321

answers:

6

What's the most reliable, generic way to construct a self-referential URL? In other words, I want to generate the http://www.site.com[:port] portion of the URL that the user's browser is hitting. I'm using PHP running under Apache.

A few complications:

  • Relying on $_SERVER["HTTP_HOST"] is dangerous, because that seems to come straight from the HTTP Host header, which someone can forge.

  • There may or may not be virtual hosts.

  • There may be a port specified using Apache's Port directive, but that might not be the port that the user specified, if it's behind a load-balancer or proxy.

  • The port may not actually be part of the URL. For example, 80 and 443 are usually omitted.

  • PHP's $_SERVER["HTTPS"] doesn't always give a reliable value, especially if you're behind a load-balancer or proxy.

  • Apache has a UseCanonicalName directive, which affects the values of the SERVER_NAME and SERVER_PORT environment variables. We can assume this is turned on, if that helps.

A: 

$_SERVER["HTTP_HOST"] is probably the best way, after some validation of course.

Yes, the user specifies it and so it cannot be trusted, but you can easily detect when the user is playing games with it.

Allain Lalonde
+1  A: 

As I recall, you want to do something like this:

$protocol = 'http';

if ( (!empty($_SERVER['HTTPS'])) || ($_SERVER['HTTPS'] == 'off') ) {
    $protocol = 'https';
    if ($_SERVER['SERVER_PORT'] != 443)
        $port = $_SERVER['SERVER_PORT'];
} else if ($_SERVER['SERVER_PORT'] != 80) {
    $port = $_SERVER['SERVER_PORT'];
}
// Server name is going to be whatever the virtual host name is set to in your configuration
$address = $protocol . '://' . $_SERVER['SERVER_NAME'];
if (!empty($port))
    $address .= ':' . $port
$address .= $_SERVER['REQUEST_URI'];
// Optional, if you want the query string intact
if (!empty($_SERVER['QUERY_STRING']))
    $address .= '?' . $_SERVER['QUERY_STRING'];

I haven't tested this code, because I don't have PHP handy at the moment.

R. Bemrose
+2  A: 

I would suggest that the only way to be sure and to be secure is to define a constant for the url in some kind of config file for the site. You could generate the constant with $_SERVER['HTTP_HOST'] as a default and replace with a hard coded definition on deployments where security really matters.

define('SITE_URL', $_SERVER['HTTP_HOST']);

and replace as needed:

define('SITE_URL', 'http://foo.bar.com:8080/');
navitronic
That's definitely my fallback, if I don't find another way. But it would be nice to avoid extra configuration.
JW
I think that if there is a way, maybe one of the popular open source projects like wordpress, drupal etc might that have the answer.I know that wordpress works out the url upon installation and then stores it in its configuration table.
navitronic
A: 

One idea for validating that $_SERVER['HTTP_HOST'] is valid could be to validate it by DNS. I've used this method in one or two cases without serious consequences to speed and I believe this method fails silently if provided a IP address.

http://www.php.net/manual/en/function.gethostbyname.php

Peusudo code might be:

define('SITEHOME', in_array(gethostbyname($_SERVER['HTTP_HOST']), array(... valid IP's))) 
? $_SERVER['HTTP_HOST']
: 'default_hostname';
David
+1  A: 

The most reliable way is to provide it yourself.

The site should be coded to be hostname neutral, but to know about a special configuration file. This file doesn't get put into source control for the codebase because it belongs to the webserver's configuration. The file is used to set things like the hostname and other webserver-specific parameters. You can accomodate load balancers, changing ports, etc, because you're saying if an HTTP request hits that code, then it can assume however much you will let it assume.

This trick also helps development, incidentally. :-)

staticsan
A: 

why {if you wish the user to continue using http:///host:port/ that they are on do you wish to generate full urls} whan you can use relative urls instead of either

say on page http://xxx:yy/zzz/fff/

you culd use either

../graphics/whatever.jpg {to go back one directory from current and get http://xxx:yy/zzz/graphics/whatever.jpg

or /zzz/graphics/whatever.jpg {to goto site root and work up the directories as specified}

these both avoid mentioning the host:port part and inherit it from the one currently in use

Alan Doherty
See the comments on the original question for one example.
JW