views:

372

answers:

3

I use a RewriteRule to replace all spaces (aka %20) in my URLs by underscores:

RewriteRule (.*)[\ ](.*) $1_$2 [N]

The [N] flag starts the rewrite process again, until no space is left (hopefully). Now, all is well, when there is a file waiting at the other side, i.e. request:

/This is an example.html

and file:

This_is_an_example.html

But when there is no matching file, Apache exits with a 500 Internal Error. The error log states a segfault, and the rewrite log shows, that the rewrite engine goes mad trying to redirect to

/This is an_example.html/This is an_example.html/This is an_example.html/...

and so on until the segfault (note, that the last space was converted, but none else).

Has anyone an idea, why the RewriteRule fails so miserably, while working for existing files?

Update: The segfault only occurs, if there is an additional "/" in the requested URI, like

/virtual-directory/This is an example.html

Update 2: There is no RewriteCond in this statement. In my opinion it should work without. The question is, why doesn't it. Htaccess:

RewriteEngine On
RewriteBase /test/

RewriteRule (.*)[\ ](.*) $1_$2 [N]

Update 3: The .htaccess file in question:

AddType application/xhtml+xml .xhtml
AddEncoding x-gzip .gz

ExpiresActive On
ExpiresByType application/xhtml+xml "access plus 1 year"

ErrorDocument 404 /test/index.php?error=404

RewriteEngine On

RewriteBase /test/

# replace spaces with underscores
RewriteCond %{REQUEST_FILENAME} .*[\ ].*
RewriteRule (.*)[\ ](.*) $1_$2 [N]

# the index.php
RewriteRule ^index.php - [L]

# language tag
RewriteRule ^([a-z]{2})$ index.php?lang=$1&url= [L,QSA]
RewriteRule ^$ en [R=301,L]

# add .xhtml
RewriteCond %{REQUEST_FILENAME}.xhtml -s [OR]
RewriteCond %{REQUEST_FILENAME}.xhtml.gz -s
RewriteRule ^(.+) $1.xhtml

# add .gz
RewriteCond %{HTTP:Accept-Encoding} .*gzip.*
RewriteCond %{REQUEST_FILENAME}.gz -s
RewriteRule ^(.+) $1.gz

# send correct mime type
#RewriteCond %{REQUEST_FILENAME} -s
RewriteRule \.xhtml\.gz$  - [L,T=application/xhtml+xml]

The rewriteLog statements are too large to paste. A single request fills the file with 13MB of data. Here is the digest:

(IP) - - [date] [host] (3) [perdir /htdocs/test/] add path info postfix: /htdocs/test/de -> /htdocs/test/de/This is a test
(IP) - - [date] [host] (3) [perdir /htdocs/test/] strip per-dir prefix: /htdocs/test/de/This is a test -> de/This is a test
(IP) - - [date] [host] (3) [perdir /htdocs/test/] applying pattern '(.*)[\ ](.*)' to uri 'de/This is a test'
(IP) - - [date] [host] (2) [perdir /htdocs/test/] rewrite 'de/This is a test' -> 'de/This is a_test'
(IP) - - [date] [host] (3) [perdir /htdocs/test/] add per-dir prefix: de/This is a_test -> /htdocs/test/de/This is a_test
(IP) - - [date] [host] (3) [perdir /htdocs/test/] add path info postfix: /htdocs/test/de/This is a_test -> /htdocs/test/de/This is a_test/This is a test
(IP) - - [date] [host] (3) [perdir /htdocs/test/] strip per-dir prefix: /htdocs/test/de/This is a_test/This is a test -> de/This is a_test/This is a test
(IP) - - [date] [host] (3) [perdir /htdocs/test/] applying pattern '(.*)[\ ](.*)' to uri 'de/This is a_test/This is a test'
(IP) - - [date] [host] (2) [perdir /htdocs/test/] rewrite 'de/This is a_test/This is a test' -> 'de/This is a_test/This is a_test'
(IP) - - [date] [host] (3) [perdir /htdocs/test/] add per-dir prefix: de/This is a_test/This is a_test -> /htdocs/test/de/This is a_test/This is a_test
(IP) - - [date] [host] (3) [perdir /htdocs/test/] add path info postfix: /htdocs/test/de/This is a_test/This is a_test -> /htdocs/test/de/This is a_test/This is a_test/This is a test
(IP) - - [date] [host] (3) [perdir /htdocs/test/] strip per-dir prefix: /htdocs/test/de/This is a_test/This is a_test/This is a test -> de/This is a_test/This is a_test/This is a test
(IP) - - [date] [host] (3) [perdir /htdocs/test/] applying pattern '(.*)[\ ](.*)' to uri 'de/This is a_test/This is a_test/This is a test'
(IP) - - [date] [host] (2) [perdir /htdocs/test/] rewrite 'de/This is a_test/This is a_test/This is a test' -> 'de/This is a_test/This is a_test/This is a_test'
(IP) - - [date] [host] (3) [perdir /htdocs/test/] add per-dir prefix: de/This is a_test/This is a_test/This is a_test -> /htdocs/test/de/This is a_test/This is a_test/This is a_test
... and so on ad infinitum

The bad guy is line 6: "add path info postfix".

A: 

Try an absolute replacement:

RewriteRule ^([^\ ]*)\ (.*) /$1_$2 [N]
Gumbo
No, that doesn't work, same result. I circumvent it now with a condition: `RewriteCond %{REQUEST_FILENAME} .*[\ ].*`
Boldewyn
@Boldewyn: It works for me this way.
Gumbo
+2  A: 

UPDATE

looking at your update 3 (helpful, thanks), 2 more ideas come to my mind:

  1. there is an apache issue: *mod_rewrite adds path info postfix after a substitution occured*, that looks similar to yours. this was fixed in 2.2.12 with the introduction of the discardpathinfo|DPI RewriteRule flag. try adding this flag to your RewriteRule:

    RewriteRule (.*)[\ ](.*) $1_$2 [N,DPI]
    

    and see if that solves the issue. needs apache 2.2.12.

  2. do you happen to have mod_dir enabled an DirectorySlash On? try disabling it.

UPDATE END

your setup:

RewriteEngine on
RewriteCond %{REQUEST_FILENAME} .*[\ ].*
RewriteRule (.*)[\ ](.*) $1_$2 [N]

works for me (http://localhost/test/This is an example.html, rewrite rules in VirtualHost config, apache 2.2.14). here is my RewriteLog (with RewriteLogLevel 2):

init rewrite engine with requested uri /test/This is an example.html
rewrite '/test/This is an example.html' -> '/test/This is an_example.html'
rewrite '/test/This is an_example.html' -> '/test/This is_an_example.html'
rewrite '/test/This is_an_example.html' -> '/test/This_is_an_example.html'
local path result: /test/This_is_an_example.html
prefixed with document_root to D:/var/www/test/This_is_an_example.html
go-ahead with D:/var/www/test/This_is_an_example.html [OK]

when D:/var/www/test/This_is_an_example.html does not exist, i get a 404 - but no segfault.

could you post your RewriteLog? where do you put your rewrite rules? into an .htaccess? do you have any directives (AliasMatch etc.) that also work on your url?

ax
It works for me too, when I add the RewriteCond. The question however is, why do I need the RewriteCond at all? If you look at my RewriteRule in the question, it should loop round and round until no space is left, and that without need of a RewriteCond. My question is, why doesn't it do it.
Boldewyn
it also works here without the `RewriteCond` (same `RewriteLog`). so again: what does your `RewriteLog` look like, are your rewrite rules in an `.htaccess`, and/or are there other directives matching the url in question?
ax
The rewrite rules are in an .htaccess file. Please note my first update in the question, the segfault only occurs, if there is a folder-like component before the rule. I'll paste a bit of the rewrite log above into the question.
Boldewyn
thanks for the log. see my update for another idea.
ax
Wow, thanks for digging up the bug. It really looks like my problem. Unfortunately I still have 2.2.11, but I'll try and set up 2.2.12 to verify this, before the bounty expires.
Boldewyn
Yep, that bug was the problem. Now I'm gonna print out this question, frame it and put it on the wall. In the end, how many mod_rewrite questions on SO end with: "It's not your fault, it's a bug in mod_rewrite"? ;-)
Boldewyn
A: 

By and large, the Apache developers will take a SIGSEGV in httpd quite seriously if presented with a repeatable test case. Even more so if you attach a debugger and capture a backtrace.

bmargulies
the segfault could probably be disabled with LimitInternalRecursion (Apache 2.0.47+) or RewriteOptions MaxRedirects (2.0.45-46)
ax