views:

183

answers:

2

So I have asked for help over at my weblog, scoured the internet, and pored over the examples you all have provided on here before, and I still cannot find an answer that works.

Simply put, I am trying to take all traffic referred to my site from Site A, and redirect it all to Page B within my domain. I have gotten the redirect to work perfectly, but I cannot get it to break out of an infinite loop. Any assistance would be greatly appreciated.

Code follows (though it has been "anonymized" from the specific pages I was using):

<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !filename\.html$
RewriteCond %{HTTP_REFERER} ^http://online\.webpage\.com.* [NC]
RewriteRule (.*) http://www.wallsofthecity.net/year/mo/filename.html [L]
</IfModule>

Like I said, the RewriteRule works beautifully, but the first RewriteCond does not appear to be flagging when it is at the appropriate page, and just keeps redirecting folks, ad nauseum. I have been using this site: http://rexswain.com/httpview.html to check my code, and while useful, it has not given me any good answers.

Thanks for whatever help you can provide.

UPDATE:

So here is the .htacces file in its entirety, since that may make things easier:

<IfModule mod_rewrite.c>
RewriteEngine On

# Force an external redirect to this page for referrals from that site
# This page *must* exist to prevent a loop (which it does, I checked :P)
RewriteCond %{HTTP_REFERER} ^http://mikeb302000\.blogspot\.com.* [NC]
RewriteCond %{REQUEST_URI} !=/2010/04/cruisin-for-a-bruisin.html
RewriteRule . /2010/04/cruisin-for-a-bruisin.html [R,L]

# This scenario performs no rewrite, so it should actually just be handled by
# the RewriteConds below (they won't match), but I didn't test that
RewriteCond %{REQUEST_URI} ^/(stats|failed_auth\.html).*$ [NC]
RewriteRule . - [L]

Redirect permanent /index.xml http://www.wallsofthecity.net/feed/
Redirect permanent /rss.xml http://www.wallsofthecity.net/feed/
Redirect permanent /atom.xml http://www.wallsofthecity.net/feed/atom/
Redirect permanent /12_tribes http://www.wallsofthecity.net/category/12-tribes
Redirect permanent /as_i_say_not_do http://www.wallsofthecity.net/category/as-i-say-not-do
Redirect permanent /bigotry_exposed http://www.wallsofthecity.net/category/bigotry-exposed
Redirect permanent /commercial_appeal http://www.wallsofthecity.net/category/commercial-appeal
Redirect permanent /cowardice_on_parade http://www.wallsofthecity.net/category/cowardice-on-parade
Redirect permanent /crosscountry_jaunt http://www.wallsofthecity.net/category/crosscountry-jaunt
Redirect permanent /digital_real_estate http://www.wallsofthecity.net/category/digital-real-estate
Redirect permanent /fools_and_jesters http://www.wallsofthecity.net/category/fools-and-jesters
Redirect permanent /for_hire http://www.wallsofthecity.net/category/for-hire
Redirect permanent /me_myself_and_i http://www.wallsofthecity.net/category/me-myself-and-i
Redirect permanent /musings_of_a_madman http://www.wallsofthecity.net/category/musings-of-a-madman
Redirect permanent /one-line_review http://www.wallsofthecity.net/category/one-line-review
Redirect permanent /patron_polity_of_perforation http://www.wallsofthecity.net/category/patron-polity-of-perforation
Redirect permanent /peoples_republic_of_kalifornistan http://www.wallsofthecity.net/category/peoples-republic-of-kalifornistan
Redirect permanent /sensor_ping http://www.wallsofthecity.net/category/sensor-ping
Redirect permanent /serenity http://www.wallsofthecity.net/category/serenity
Redirect permanent /simon_jester http://www.wallsofthecity.net/category/simon-jester
Redirect permanent /the_funnies http://www.wallsofthecity.net/category/the-funnies
Redirect permanent /the_mat http://www.wallsofthecity.net/category/the-mat
Redirect permanent /things_that_go_boom http://www.wallsofthecity.net/category/things-that-go-boom
Redirect permanent /toysgizmosgadgets http://www.wallsofthecity.net/category/toysgizmosgadgets
Redirect permanent /urk http://www.wallsofthecity.net/category/urk
Redirect permanent /window_on_the_world http://www.wallsofthecity.net/category/window-on-the-world

</IfModule>

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

Note: it still does not appear to be working. Test away :).

A: 

The first thing I'd try is changing the rewrite condition as follows:

RewriteCond %{REQUEST_FILENAME} !^.*filename\.html$

I think Apache is not supposed to force the match to start at the beginning of the string, but if it does (for whatever reason) and if that's the cause of your problem, this change will fix it.

As an alternative to that, just drop the first rewrite condition entirely and change your rewriting rule to

RewriteRule !year/mo/filename.html http://www.wallsofthecity.net/year/mo/filename.html
David Zaslavsky
Thanks for the try, David, but neither of those seem to be working. If it provides any insight to you or anyone else, the above tool indicates that the headers return "Location:http://www.wallsofthecity.net/year/mo/filename.html" for every single repeat redirect, and that location address redirects the browsers... even though the browsers may already be at *that page*. ... Which confuses me :).
@linoge: I figured that would be what's happening, although it seems slightly odd since Apache should be stripping off the `http://www.wallsofthecity.net` part. Anyway, I have to wonder if there are other `RewriteRule`s that are interfering with these.
David Zaslavsky
Oh wait - I updated my `RewriteRule` to remove the leading slash in the pattern, since you don't include that in `.htaccess` files. Try the updated version and see if it works. (If not, what version of Apache are you using?)
David Zaslavsky
There are rewrite rules after this sequence, but none before, and should the [L] tag not stop the .htaccess file from proceeding any farther? Took the leading slash off, no joy. Provided the full address, no joy. And Dreamhost, in their infinite wisdom, appear to have set ServerTokens to "Prod", so I am not entirely sure what version of Apache I am running :).
`[L]` stops any further rules from applying on the current request, but if Apache creates an external redirect (which is what it sounds like you're doing) or a subrequest, the whole process starts again from the beginning, so the `[L]` flag may not be the whole story. Something must be going on elsewhere in the configuration, because that altered `RewriteRule` I gave you is practically the mod_rewrite equivalent of `while (a) a=false;` (I mean, it shouldn't be able to cause a loop by itself, and it should be obvious that it can't). It might be helpful if we had the actual URL to run tests.
David Zaslavsky
Oh, and to find the Apache version, check on Dreamhost's support website, or try running `httpd -v` at the command line. (You might have to try some variations like `/usr/sbin/httpd -v` or `/usr/sbin/apache2 -v` etc.)
David Zaslavsky
Ah... Did not know that the [L] flag did not end the entire exercise - there were actually three other mod_rewrite instances in .htaccess, and that might have been the problem. ... Except I condensed all those into one, and it sitll does not work. The actual URLs and specific information from the entire .htaccess file is above, though I still cannot coax an Apache version out of Dreamhost. Their support page indicates they have "a lot" of versions installed, useless though that may be.
*sigh*...well, I'll keep thinking about it and comment here if I have more ideas. These sorts of problems are quite difficult to debug "remotely" - usually what does it in the end is a fair amount of trial and error, coupled with extensive reference to the mod_rewrite documentation.
David Zaslavsky
That is what I was afraid of - I spent the last few days changing around individual characters to see if any of that helped, saving, checking, changing something else, saving, checking... Thanks for what help you have provided - I was hoping I had done something stupid and obvious, honestly :).
OK, well... I managed to connect to your website and reproduce the problem. It didn't suggest any magic solution, but I'd advise you to check whatever logs you have access to and see if there's any relevant information in there. Maybe ask the Dreamhost people if there's any way they can enable rewrite logging. Also, you could try just removing the `RewriteBase` line and see if that changes anything.
David Zaslavsky
Here are the most-recent logs regarding the redirect: 69.36.187.33 - - [29/Jun/2010:15:38:32 -0700] "GET / HTTP/1.1" 302 533 "http://mikeb302000.blogspot.com" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0;)" 69.36.187.33 - - [29/Jun/2010:15:38:32 -0700] "GET /2010/04/cruisin-for-a-bruisin.html HTTP/1.1" 200 533 "http://mikeb302000.blogspot.com" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0;)" No error logs were generated for the redirects, and removing RewriteBase did not seem to do anything. I will ping DH and see what they say.
That's weird... the log excerpt you posted shows that the request for `/2010/04/cruisin-for-a-bruisin.html` returned a status code of 200 (OK), not 302 (temporary redirect), which suggests that it worked, but then again I just tried my test again and the problem still exists. Maybe the Dreamhost people can figure it out, since they have access to all the server configs and can see if there's something elsewhere that might be interfering with your rewriting rules.
David Zaslavsky
So Dreamhost was not all that useful, but they did confirm that they are running Apache 2.2. Apart from that, they did not offer any useful suggestions, apart from "debugging code is not our job" :).
A: 

Edit: Let's try this instead (removed previous to limit post length):

External Redirect:

RewriteEngine On

# Force an external redirect to this page for referrals from that site
# This page *must* exist to prevent a loop (which it does, I checked :P)
RewriteCond %{HTTP_REFERER} ^http://mikeb302000\.blogspot\.com.* [NC]
RewriteCond %{REQUEST_URI} !=/2010/04/cruisin-for-a-bruisin.html
RewriteRule . /2010/04/cruisin-for-a-bruisin.html [R,L]

# This scenario performs no rewrite, so it should actually just be handled by
# the RewriteConds below (they won't match), but I didn't test that
RewriteCond %{REQUEST_URI} ^/(stats|failed_auth\.html).*$ [NC]
RewriteRule . - [L]

# With an external redirect, the first RewriteCond catches our referrer redirect
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

# These are always evaluated after mod_rewrite stuff, but use the original
# REQUEST_URI unless we explicitly passed it through to handlers later in
# the chain (via the PT flag)
Redirect permanent ... (trimmed)

Internal Redirect:

RewriteEngine On

# Force an external redirect to this page for referrals from that site
# This page *must* exist to prevent a loop (which it does, I checked :P)
RewriteCond %{HTTP_REFERER} ^http://mikeb302000\.blogspot\.com.* [NC]
RewriteCond %{REQUEST_URI} !=/2010/04/cruisin-for-a-bruisin.html
RewriteRule . /2010/04/cruisin-for-a-bruisin.html [L]

# This scenario performs no rewrite, so it should actually just be handled by
# the RewriteConds below (they won't match), but I didn't test that
RewriteCond %{REQUEST_URI} ^/(stats|failed_auth\.html).*$ [NC]
RewriteRule . - [L]

# With an internal redirect, we have to do an extra check to prevent this rewrite
# with our referrer redirect
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond $0 !-f
RewriteRule .* /index.php [L]

# These are always evaluated after mod_rewrite stuff, but use the original
# REQUEST_URI unless we explicitly passed it through to handlers later in
# the chain (via the PT flag)
Redirect permanent ... (trimmed)

Edit (again):

Let's try a little diagnostics too...Can you put this above your other rules in the .htaccess file?

RewriteCond %{QUERY_STRING} diagnostic
RewriteRule . /?ref=%{HTTP_REFERER}&uri=%{REQUEST_URI}&matchable=$0 [R,L]
Tim Stone
*sigh* That does not seem to work either. Would it help/matter that this webpage is hosted at Dreamhost? Is there any kind of global function/variable I should pay attention to / look up?
Hmm... As far as I know there's nothing particularly quirky about Dreamhost, but I've never used them personally, so I can't say for certain. Just to be sure, where is your `.htaccess` file located, and is this the only place where there are `RewriteRule`s? There's got to be an explanation somewhere, heh.
Tim Stone
The .htaccess file is located in the root directory for the webpage. After the above code, there is a whole long string of "Redirect permanent" commands in a separate "RewriteEngine On" declaration (to make up for changing from MovableType to Wordpress and breaking links), and then this code: RewriteEngine On RewriteBase / RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] Should the first [L] tag not have stopped the .htaccess file before then, though? (There are linebreaks in there, I just cannot get them to stick.)
Ah, I see. Does `path/to/filename.html` actually point to a real file, or are you intending for it to be handled by WordPress? The `L` stops anymore rewrite rules from being applied before `mod_rewrite` sends the new URL as an internal redirect to Apache. Because your internal redirect points to the same directory hierarchy, the `.hatccess` is evaluated again when handling that "new" request.
Tim Stone
It points to an honest-to-God file, though it does not appear to be doing me a whole lot of good. Now that I know that [L] only stops within each mod_rewrite call, I compressed the three separate mod_rewrite functions... but even that does not seem to be an effective solution.
Alright, this also works on my test server (and I noticed a definite reason why it would have caused an endless loop before, so let's hope this works better this time...)
Tim Stone
Ok, now I am at something of a loss... I just tested the two of those, and while the former does not do anything whatsoever (as in no redirect, no infinite loop, nothing), the latter breaks out to eventually being a 500 error. And I have *no idea why*. This is turning into more of a pain than it is probably worth... sorry :(.
Seriously? I'm beginning to think that Dreamhost may be the portal to hell...But this is interesting, so I'm game to keep at it as long as you are. I saw you have access to the access logs, can you get at the error logs as well? I'm curious to see how it managed to not match the rules in the first one, since that seems to be what happened, so if you could try and find the entries from where you tested it, that might help.
Tim Stone
I've also added in a thing that might help diagnose what's going on. It'll let me see what the variables we're checking are assigned, but shouldn't disrupt normal traffic in any way.
Tim Stone
Well, if you are up for it, I am :). Without the above debugging code, here is the error report for the second clip of code: [Wed Jun 30 17:48:52 2010] [error] [client 69.36.187.33] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace., referer: http://mikeb302000.blogspot.com. With that debugging code installed, what should I be looking for?
Can you replace your `.htaccess` in your question with what it is right now with the debugging code and everything? I went ahead and crafted a link to run the diagnostic stuff as if I were coming from the site you want to redirect, and there seems to be something a little off about what's going on when it processes everything. After you copy it to your question, you can remove the debugging part, we won't need it anymore, and then I'll figure out how the hell this could be happening.
Tim Stone
Copy-pasting done, see above :).
So I noticed that if you go to http://www.wallsofthecity.net/2010/04/ (with the trailing slash, which SO hides...which is an existing directory), it redirects you to http://www.wallsofthecity.net/2010/04, which gets handled by WordPress. But there's definitely not anything in your `.htaccess` file that does that (it doesn't happen on my test server with the same file), so it must be happening somewhere else. Is there anywhere in the DH control panel that handles stuff like this? At this point I'm thinking something external to the file must be at play somehow.
Tim Stone
Huh. Doing a little research at my end, that appears to be the product of a certain cacheing plugin I had installed on WP. Disabling that does not seem to be solving the problem. Likewise, disabling *another* plugin that stops things redirect-bouncing does not seem to have solved it either. Updated copy-paste above, but the above code simply serves up the homepage (when so requested), regardless of referer. I guess we are narrowing it down... :)
And before you ask, yup, I went ahead and disabled all the relevant plugins that could be contributing, and it still did not work.