views:

95

answers:

4

Hello.

What I'm trying to do: have pretty URLs in the format 'http://domain.tld/one/two/three', that get handled by a PHP script (index.php) by looking at the REQUEST_URI server variable.
In my example, the REQUEST_URI would be '/one/two/three'. (Btw., is this a good idea in general?)

I'm using Apache's mod_rewrite to achieve that.
Here's the RewriteRule I use in my .htaccess:

RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]

This works really well thus far; it forwards every REQUEST_URI that consists of a-z, A-Z or a '/' to /index.php, where it is processed.

Only drawback: '?' (question marks) and '#' (hash keys) seem to still be allowed in the REQUEST_URI, maybe even more characters that I've yet to find.
Is it possible to restrict those via my .htaccess and an adequate addition to the RewriteRule?

Thanks!

A: 

The fragment identifer, e.g. #some-anchor, is controlled by the browser, not the server. JavaScript would be needed to redirect and remove this, although why you would want to do so I am not sure.

[SNIPPED after clarification] To rewrite only when the query string is empty:

RewriteCond %{QUERY_STRING} ^$
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]
Cez
I see, let's forget about the # then, it's not a big issue anyway, but the '?' kind of bothers me. Is there a way to not rewrite REQUEST_URIs that contain a '?'? Someone mentioned (now deleted his answer) to check for an empty QUERY_STRING, which sounded like a good idea but didn't work, maybe I did it wrong? I tried your suggestion, but that didn't do anything. Thanks.
tshabala
@tshabala The ? on the end of index.php removes the query string. I've updated my answer with information
Cez
@tshabala I've added to my answer as I spotted what you meant
Cez
I think we're getting there, although that's not quite what I was going for. Your rewrite rules (and condition) remove the QUERY_STRING from the REQUEST_URI, right? What I want to do is ONLY rewrite the REQUEST_URI to index.php when the REQUEST_URI doesn't contain a '?' (i.e. no QUERY_STRING), hence 404 error page when the REQUEST_URI contains a '?'.
tshabala
@tshabala I've tidied my answer and amended the rewrite. The only way to get rid of the ? when present on its own is to redirect, which you can't do with the rewrite as the query string will be empty, as Gumbo explained. You would therefore need to redirect using PHP
Cez
A: 

The $_SERVER['REQUEST_URI'] variable will contain the original REQUEST_URI as received by the server, before you perform the rewrite. Therefore it's impossible (as far as I know this early in the morning) to remove the query string portion from the REQUEST_URI's attribute, but you naturally have the option of removing it when you process the $_SERVER['REQUEST_URI'] variable in your script.

If you want to only perform your RewriteRule when the query string is not specified, the following should work:

RewriteCond %{QUERY_STRING} !^.+$
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]

Note that this might be problematic though, since if there's accidentally a query string in a URL that someone uses to link to your site, your script wouldn't be handling it (since the rewrite never happens), so they'll get a 404 response (or whatever the case may be) that might not be as user-friendly as if you had just chosen to silently ignore the trailing information.

Tim Stone
A: 

If i understand, you want to forbid using of ? and # for your site?

You shouldn't do that, because:

  • hash (#) is used in AJAX URLs google specification,
  • question mark (?) is used for example in Google AdWords and Analytics or any Affiliation Program,

So if you force Apache to reject url request containing question mark, people who click on your Ad in AdWords will only see 404 error page.

There is nothing bad in letting people to use both of them. The case is to prevent your site against XSS attacks.

Btw. there is another very importand sign - percent (%) which is used to encode special chars (like Polish or German national letters)

Dobiatowski
A: 

In mod_rewrite and PHP the variable REQUEST_URI refers to two different part of the URI. In mod_rewrite, %{REQUEST_URI} contains the current URI path; in PHP, $_SERVER['REQUEST_URI'] contains the URI path and query. But in both cases the URI fragment as this part of the URI is not transmitted to the server but only used by the client.

So, when /one/two/three?foo#bar is requested, mod_rewrite’s %{REQUEST_URI} contains /one/two/three and PHP’s $_SERVER['REQUEST_URI'] contains /one/two/three?foo.

Gumbo