views:

1335

answers:

2

Hello,

My Web host has refused to help me with this, so I'm coming to the wise folks here for some help "black-box debugging". Here's an edited version of what I sent to them:

I have two (among other) domains at dreamhost:

1) thefigtrees.net 2) shouldivoteformccain.com

I noticed today that when I host a CGI script on #1, that by the time the CGI script runs, the HTTP GET query string passed to it as the QUERY_STRING environment variable has already been URL decoded. This is a problem because it then means that a standard CGI library (such as perl's CGI.pm) will try to split on ampersands and then decode the string itself. There are two potential problems with this:

1) the string is doubly-decoded, so if a value is submitted to the script such as "%2525", it will end up being treated as just "%" (decoded twice) rather than "%25" (decoded once)

2) (more common) if there is an ampersand in a value submitted, then it will get (properly) submitted as %26, but the QUERY_STRING env. variable will have it already decoded into an "&" and then the CGI library will improperly split the query string at that ampersand. This is a big problem!

The script at http://thefigtrees.net/test.cgi demonstrates this. It echoes back the environment variables it is called with. Navigating in a browser to:

http://thefigtrees.net/lee/test.cgi?x=y%26z

You can see that REQUEST_URI properly contains x=y%26z (unencoded) but that QUERY_STRING already has it decoded to x=y&z. If I repeat the test at domain #2 ( http://www.shouldivoteformccain.com/test.cgi?x=y%26z ) I see that the QUERY_STRING remains undecoded, so that CGI.pm then splits and decodes correctly.

I tried disabling my .htaccess files on both to make sure that was not the problem, and saw no difference.

Could anyone speculate on potential causes of this, since my Web host seems unwilling to help me?

thanks, Lee

A: 

Curious. Nothing I can see from here would give us a clue why this would happen... I can only confirm that it is an environment bug and suspect maybe configuration differences like maybe rewrite rules.

Per CGI 1.1, this decoding should only happen to SCRIPT-NAME and PATH-INFO, not QUERY-STRING. It's pointless and annoying that it happens at all, but that's the spec. Using REQUEST-URI instead of those variables where available (ie. Apache) is a common workaround for places where you want to put out-of-bounds and Unicode characters in path parts, so it might be reasonable to do the same for query strings until some sort of resolution is available from the host.

VPSs are cheap these days...

bobince
A: 

I have the same behavior in Apache.

I believe mod_rewrite will automatically decode the URL if it is installed, however, I have seen the auto-decode behavior even without it. I haven't tracked down the other culprit.

A common workaround is to double encode the input parameter (taking advantage of URL decoding being safe when called on an unencoded URL).

sirdodger