views:

57

answers:

1

I'm working on a webpage that takes a URL as a parameter, and would like it to be easily indexed by search engines. One requirement is that each URL appears as a directory.

My script is in the format:

myscript?url=<a url>&page=1

I'd like redirects to look something like:

lookup/<a url>/page:1/

The URL is predictably giving me trouble... I just want to tell mod_rewrite to select anything after "lookup/" and before "/page:". Of course, nothing is ever as simple as it could be.

Here's the rewrite as it is now:

RewriteEngine on
RewriteRule ^/lookup/(.+)/page:([0-9]+)(/?)$ /myscript?url=$1&page=$2 [L]

This works great, except it fails when URLs are properly encoded. Take the example of "www.google.com/finance". Here's what happens when I enter these URLs into my browser's address bar:

#this works
lookup/www.google.com/finance/page:1/

#this doesn't work.  url is cut off before the ?
lookup/www.google.com/finance?foo=bar/page:1/

#doesn't match rewrite at all!
lookup/www.google.com%2Ffinance/page:1/

I'm at a loss as to how to do this... Shouldn't (.+) select anything? Do I need to tell mod_rewrite to ignore query parameters somehow?

Thanks

A: 

Try this:

RewriteCond %{THE_REQUEST} ^GET\ /lookup/([^\s]+)/page:([0-9]+)/[?\s]
RewriteRule ^/lookup/ /myscript?url=%1&page=%2 [L]

But you should really consider encoding that embedded URL properly instead of just guessing where it might end. So /lookup/www.google.com/finance?foo=bar/page:1/ should be at least /lookup/www.google.com/finance%3Ffoo=bar/page:1/ so the ? is part of the URI path and not the indicator for the query.

Gumbo
I'll give this a shot. Thanks for prompt answer!
No dice... apache doesn't like the %2F. When I take it out, it works.
Just to be clear: /lookup/www.google.com/finance%3Ffoo=bar/page:1/ works, but /lookup/www.google.com/finance%2Ffoo=bar/page:1/ does not.