tags:

views:

623

answers:

5

I'm seeking a portable way to receive the (handy) $_SERVER['PATH_INFO'] variable.

After reading a while, it turns out PATH_INFO is originated from CGI/1.1, and my not always be present in all configuration.

What is the best (mostly security-wise) way to get that variable - apart from extracting it manually (security concern).

A: 

you could try

$_ENV['PATH_INFO']; or
getenv('PATH_INFO'];
streetparade
`PATH_INFO` is defined via server side. It may not always be there, not matter if it's accessed by `_ENV` or `_SERVER`.
LiraNuna
Needless to say that `_ENV` does not even contain that property.
LiraNuna
A: 
function getPathInfo() {
    if (isset($_SERVER['PATH_INFO'])) {
        return $_SERVER['PATH_INFO'];
    }  
    $scriptname = preg_quote($_SERVER["SCRIPT_NAME"], '/');
    $pathinfo = preg_replace("/^$scriptname/", "", $_SERVER["PHP_SELF"]);
    return $pathinfo;
}

Edit: without SCRIPT_NAME, and assuming you have DOCUMENT_ROOT (or can define/discover it yourself) and assuming you have SCRIPT_FILENAME, then:

function getPathInfo() {
    if (isset($_SERVER['PATH_INFO'])) {
        return $_SERVER['PATH_INFO'];
    }  
    $docroot = preg_quote($_SERVER["DOCUMENT_ROOT"], "/");
    $scriptname = preg_replace("/^$docroot/", "", $_SERVER["SCRIPT_FILENAME"]);
    $scriptname = preg_quote($scriptname, "/");
    $pathinfo = preg_replace("/^$scriptname/", "", $_SERVER["PHP_SELF"]);
    return $pathinfo;
}

Also @ Anthony (not enough rep to comment, sorry): Using str_replace() will match anywhere in the string. It's not guaranteed to work, you want to only match it at the start. Also, your method of only going 1 slash back (via strrpos) to determine SCRIPT_NAME, will only work if the script is under the root, which is why you're better off diffing script_filename against docroot.

oops
there's a trivial and deadly error in the construction of the regular expression.
just somebody
Did you even check it? This is broken for so many reasons.
LiraNuna
yes i did check it. i left out the delimiter argument when pasting into SO though because i thought it defaulted to /, apparently not. in what other ways is it broken?
oops
If you call it with 'index.php/path/to/somewhere.ext?var=data' you will get a nice error. Also, `SCRIPT_NAME` suffers from the same problem `PATH_INFO` does.
LiraNuna
sorry, but i don't get any error when using that that url... it works, but i do have script_name available. what is the error you're getting?
oops
The error was caused by the missing delimiter. It doesn't matter though, as `SCRIPT_NAME` will not be defined if `PATH_INFO` isn't, they are both a part of the CGI interface variables. (See http://hoohoo.ncsa.illinois.edu/docs/cgi/env.html).
LiraNuna
I just saw your comment to Anthony's answer. You don't have REQUEST_URI either? I was going to provide a solution intersected PHP_SELF and REQUEST_URI but I guess not. Can you tell us what you're guaranteed to have?
oops
See the link I posted? You can't use what's in there.
LiraNuna
@LiraNuna - you said REQUEST_URI is not available. That is not mentioned on the above link.
Anthony
`REQUEST_URI` is apache-specific (actually it's from mod_rewrite) and will not work on IIS/nginx/lighttpd etc. I want portability, not creating more problems.
LiraNuna
I understand both of you are trying to help, but I really need a *portable* way.
LiraNuna
Well, I can't speak for IIS but in nginx I personally expose script_name, script_filename, request_uri, etc. via fastcgi_param. The same is possible for lighttpd. Anyway, getting back on topic: PATH_INFO is the difference between the name of the script handling the request, and the full http path requested. You'll need to either discover or define those things to calculate it.
oops
A: 

I didn't see the comments or the link before posting. Here is something that might work, based on what the page referenced above gives as CGI-derived variables:

function getPathInfo() {
    if (isset($_SERVER['PATH_INFO'])) {
        return $_SERVER['PATH_INFO'];
    }  

    $script_filename = $_SERVER["SCRIPT_FILENAME"];
    $script_name_start = strrpos($script_filename, "/");
    $script_name = substr($script_filename, $script_name_start);

    //With the above you should have the plain file name of script without path        

    $script_uri = $_SERVER["REQUEST_URI"];
    $script_name_length = strlen($script_name);
    $path_start = $script_name_length + strpos($script_name, $script_uri);

    //You now have the position of where the script name ends in REQUEST_URI

    $pathinfo = substr($script_uri, $path_start);
    return $pathinfo;
}
Anthony
Again, `SCRIPT_NAME` and `REQUEST_URI` will not be available if `PATH_INFO` is not defined.
LiraNuna
Request URL is not mentioned on that page you provided earlier...
Anthony
Can you please provide a list of the global variables that are provided? Do any of them already include the path info just not in an easy to extract form?
Anthony
Okay. I think oops and I are both trying to be really helpful and good sports about your situation, but it is really impolite to continue to point to the same page as though it were the ultimate source, and then act as though we are stupid for offering solutions which do NOT use global variables listed at that page. It would be really helpful if you could do the following from the same directory that your troublesome script lives at:
Anthony
You are just juggling variables now. `SCRIPT_FILENAME` is a part of the CGI spec. It will not be available if PATH_INFO is unavailable.As for `REQUEST_URI`, it's apache's mod_rewrite specific.
LiraNuna
create a page called "globaltest.php", put "phpinfo();" in that script. Open "yourserver.org/globaltest.php/stuff". Provide a list of any variables that include either "globaltest.php" or "globaltest.php/stuff". If NO variables make any reference to "/stuff" then I don't see how you expect to extract it in any way, safe or unsafe, from within your script.
Anthony
+3  A: 

Well, I'm (almost) sure that without making use of the $_SERVER superglobal keys, providing a alternative way to figure out PATH_INFO is just impossible, that being said lets first list all of the $_SERVER keys that we may possibly use:

  • 'PHP_SELF'
  • 'QUERY_STRING'
  • 'SCRIPT_FILENAME'
  • 'PATH_TRANSLATED'
  • 'SCRIPT_NAME'
  • 'REQUEST_URI'
  • 'PATH_INFO'
  • 'ORIG_PATH_INFO'

We obviously need to ignore the last two. Now we should (I don't know this for a fact, I'm just assuming because you said so) filter all the keys that exist in the link you provided (which BTW is offline ATM), that leaves us with the following keys:

  • 'PHP_SELF'
  • 'SCRIPT_FILENAME'
  • 'REQUEST_URI'

Regarding your comment to Anthonys answer:

You are just juggling variables now. SCRIPT_FILENAME is a part of the CGI spec. It will not be available if PATH_INFO is unavailable. As for REQUEST_URI, it's apache's mod_rewrite specific. – LiraNuna

I'm running LightTPD/1.4.20-1 (Win32) with PHP 5.3.0 as CGI, cgi.fix_pathinfo = 1 and $_SERVER['REQUEST_URI'] is very available to me, I also remember using that same variable back in the days when no one used mod_rewrite so my honest humble guess is that you're plain wrong in this point. Regarding the SCRIPT_FILENAME key I'm unable to test that one out ATM. Still, if we close our eyes really hard and believe that you're right that leaves us with only one variable:

  • 'PHP_SELF'

I'm not trying in being harsh here (and I still believe that there are more solutions) but if PHP_SELF is the only key you want us to work with (assuming there are no impositions on PHP_SELF itself) there is only one solution left:

function PATH_INFO()
{
 if (array_key_exists('PATH_INFO', $_SERVER) === true)
 {
  return $_SERVER['PATH_INFO'];
 }

 $whatToUse = basename(__FILE__); // see below

 return substr($_SERVER['PHP_SELF'], strpos($_SERVER['PHP_SELF'], $whatToUse) + strlen($whatToUse));
}

This function should work, however there may be some problems using the __FILE__ constant since it returns the path to the file where the __FILE__ constant is declared and not the path to the requested PHP script, so that's why the $whatToUse is there for: sou you can replace it with 'SCRIPT_FILENAME' or if you really believe in what you are saying, just use '.php'.

You should also read this regarding why not to use PHP_SELF.

If this doesn't work for you, I'm sorry but I can think of anything else.

EDIT - Some more reading for you:

Alix Axel
`REQUEST_URI` is from mod_rewrite, that's why it's apache specific.
LiraNuna
Then how do you explain that I've the REQUEST_URI variable on LigHTTPD?
Alix Axel
lighttpd has mod_rewrite equivalent? Both are open source, you know - I bet they share code.
LiraNuna
"You should also read this regarding why not to use `PHP_SELF`." - This says not to use `PHP_SELF` to write stuff on the page. I move information around and verify it.
LiraNuna
The only modules I've on lighttpd are: mod_access, mod_cgi, mod_dirlisting, mod_indexfile, mod_mimetype and mod_staticfile. Either way it **isn't** Apache specific.
Alix Axel
A: 

It depends on the definitions for "portable" and "safe".

Let me see if I understood:

1) You are not interested on CLI:

  • you mentioned PHP/CGI
  • PATH_INFO is a piece of an URL; so, it only makes sense to discuss PATH_INFO when the script is accessed from a URL (i.e. from an HTTP connection, usually requested by a browser)

2) You want to have PATH_INFO in all OS + HTTP server + PHP combination:

  • OS may be Windows, Linux, etc
  • HTTP server may be Apache 1, Apache 2, NginX, Lighttpd, etc.
  • PHP may be version 4, 5, 6 or any version

Hmmm... PHP_INFO, in the $_SERVER array, is provided by PHP to a script in execution only under certain conditions, depending on the softwares mentioned above. It is not always available. The same is true for the entire $_SERVER array!

In short: "$_SERVER depends on the server"... so a portable solution can't relay on $_SERVER... (just to give one example: we have a tutorial to set up PHP/CGI $_SERVER variables on NginX HTTP server at kbeezie.com/view/php-self-path-nginx/)

3) Despite what was mentioned above, it worths mentioning that if we somehow have the full URL that was requested available as a string, it is possible to obtain the PATH_INFO from it by applying regular expressions and other PHP string functions, safely (also validating the input string as a valid URI).

So, provided that we have the URL string... then YES, WE HAVE a portable and safe way to determine PATH_INFO from it.


Now, we have two clear and focused implementation issues:

  1. How to obtain the URL?
  2. How to obtain the PATH_INFO from the URL?

Among several possibilities, here is a possible approach:

How to obtain the URL?

1) With your deep and comprehensive knowledge about each HTTP server + OS + PHP version combination, check and try each possibility to obtain the URL from the $_SERVER array (verify 'PHP_SELF', 'QUERY_STRING', 'SCRIPT_FILENAME', 'PATH_TRANSLATED', 'SCRIPT_NAME', 'REQUEST_URI', 'PATH_INFO', 'ORIG_PATH_INFO', 'HTTP_HOST', 'DOCUMENT_ROOT' or whatever)

2) If previous step failed, make the PHP script return a javascript code that sends "document.URL" information back. (The portability issue transfered to client-side.)

How to obtain the PATH_INFO from the URL?

This code linked here does this.

This is my humble opinion and approach to the problem.

What do you think?

J. Bruni
Not all `$_SERVER` variables are safe to use. PHP_SELF for example, is based on the URL. See http://markjaquith.wordpress.com/2009/09/21/php-server-vars-not-safe-in-forms-or-links/
vdboor