views:

56

answers:

1

I've been working on a script for debugging mod_rewrite, and when testing their regex system I've had some strange results. I'm wondering if this is normal behavior for the mod_rewrite regex engine or if some part in my code is causing it.

Requested URL: http://myurl.com/path/to/something

.htaccess has: RewriteRule to where

Using my debugging system, the following is what happens when that RewriteRule is used:

path/to/something -> where/to/something

Shouldn't it be path/where/something???

Here's the full .htaccess file

RewriteEngine On

RewriteBase /ModRewriteTester

RewriteRule .* - [E=ORIG:$0]

RewriteRule to where

RewriteRule .* - [E=MODD:$0]

RewriteRule .* index.php

Then I've got a php script that's reading in the environmental variables $_SERVER['REDIRECT_ORIG'] and $_SERVER['REDIRECT_MODD'], that's where I'm getting the previously stated paths.

If anyone knows a better way to explicitly show how mod_rewrite's regex engine works I'm open to it. The initial question still stands though...

A: 

Your rule:

RewriteRule to where

...will rewrite a URL that matches to and replace it with the URL representing what would be a request to /where. It's possible in certain circumstances for mod_rewrite to try and re-add what Apache believes to be PATH_INFO, which could create a situation like the following:

path/to/somewhere  -> PATH_INFO = /to/somewhere
path/to/somewhere  -> /where
(append PATH_INFO) -> /where/to/somewhere

To check if this is the case in your scenario, you can add the DPI flag to the RewriteRule to discard the PATH_INFO if it exists. This would look like this:

RewriteRule to where [DPI]

In this case, you would end up with just the URL /where. If you wanted to replace to with where while retaining the rest of the URL, you would need a rule more like this:

RewriteRule (.*?/)?to(/.*)? $1where$2

As far as debugging your rule set goes, if you have access to the Apache configuration you're much better off using the RewriteLog directive with a sufficiently high RewriteLogLevel. If you don't have access to the configuration, you're pretty much limited to doing something similar to what you're trying to do now.

Tim Stone
`RewriteRule to where [DPI]` yields an "Internal Server Error" and `RewriteRule (.*?/)?to(/.*)? $1where$2` yields the following: `path/to/thing -> path/where/thing/to/thing`
lococobra
@lococobra: Ah, I believe the `DPI` flag may have only been added in Apache 2.2, so if you're running an earlier version it will cause an internal server error because it doesn't recognize the flag.
Tim Stone
Yes, I'm using MAMP Pro which comes with Apache 2.0.63. Any explanation as to why even the more specific regex pattern is causing such bizarre results? I'm quite familiar with normal PCRE, and I can't think of any explanation.
lococobra
@lococobra: Oh, I didn't see the second part of your comment when I replied earlier, whoops. Essentially you have the same scenario, where part of the URL is taken as PATH_INFO and then re-appended to the result of your substitution. The regexes are working correctly, but they're only used to match the input. The substitution replaces the entire URL, not just the matched pattern. You can try setting `AcceptPathInfo Off` to see if it makes any difference, but I'll have to double-check why you get it in the first place in this scenario.
Tim Stone
I've tried changing the path completely to something consisting of nothing but random characters, as well as adding AcceptPathInfo Off as you suggested. Neither altered the results in any way. I get what you're saying about replacing the entire url.. but then there's still no explanation for why any part of the url is getting added back on.
lococobra
Oddly enough, I tried removing all of the testing stuff that I was using and then trying your `RewriteRule (.*?/)?to(/.*)? $1where$2` ... the result was a "Not Found" page which lists the correct "requested URL". So I guess the only solution is that DPI flag?
lococobra