views:

66

answers:

1

About the system

I have URLs of this format in my project:-

http://project_name/browse_by_exam/type/tutor_search/keyword/class/new_search/1/search_exam/0/search_subject/0

Where keyword/class pair means search with "class" keyword.

Following is my htaccess file:-

##AddHandler application/x-httpd-php5 .php

Options Includes +ExecCGI
Options +FollowSymLinks

<IfModule mod_rewrite.c>
RewriteEngine on

############To remove index.php from URL

RewriteCond $1 !^(index\.php|resources|robots\.txt)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L,QSA]
#################################################end of find a class 


</IfModule>

I have a common index.php file which executes for every module in the project. There is only a rewrite rule to remove the index.php from URL (as you can see above).

I am not using any htaccess rewrite rules for defining the $_GET array. I have a URL parser function in PHP inside which does that instead. For the example URL I gave, the parser returns:-

Array ( [a] => browse_by_exam [type] => tutor_search [keyword] => class [new_search] => 1 [search_exam] => 0 [search_subject] => 0 )

I am using urlencode() while preparing the search URL and urldecode() while reading the search URL

Problem

I am facing problems with some characters in the URL

Character               Response
%                       400 - Bad Request - Your browser sent a request that this server could not understand.
/                       404 - Not FOund
\ # +                   Page does not break but urldecode() removes these characters.

I want to allow all these characters. What could be the problem? How do I allow these? Please help Thanks, Sandeepan

Updates

Now only / character is causing URL breaking (404 error like before). So, I tried by removing the htaccess rewrite rule which hides the index.php in the URL and tried with complete URL instead. For a search term class/new I tried with the following two URLs:-

http://project_name/index.php?browse_by_exam/type/tutor_search/keyword/class%2Fnew/new_search/1/search_exam/0/search_subject/0

http://project_name/index.php/browse_by_exam/type/tutor_search/keyword/class%2Fnew/new_search/1/search_exam/0/search_subject/0

And the first one works but the 2nd one does not. Notice the index.php?browse_by_exam in the first one.

But I cant use the 1st URL convention. I have to make / work with index.php hidden. Please help

Thanks again Sandeepan

Edit (Solved)

Considering Bobince's answer to my other question

http://stackoverflow.com/questions/3235219/urlencoded-forward-slash-is-breaking-url , I feel it is best to have URLs like this:- http://project_name/browse_by_exam?type/tutor_search/keyword/class %2Fnew/new_search/1/search_exam/0/search_subject/0

That way I get rid of the difficulty of readability caused by &param1=value1&param2=value2 convention and also able to allow forward slashes in the query string part by using ?

I want to avoid AllowEncodedSlashes because Bobince said Also some tools or spiders might get confused by it. Although %2F to mean / in a path part is correct as per the standard, most of the web avoids it.

+2  A: 

Some of the issues sound like they are related to you trying to use PATH_INFO (your RewriteRule sticks everything behind index.php as if it were a path). Would it be possible to just use the $_SERVER['REQUEST_URI'] variable as the input to your URL parser function instead? It contains the same information, and I feel it would be less problematic.

Attempting to create a PATH_INFO solution doesn't seem to work very well in a per-dir (.htaccess) context. You can set AllowPathInfo On, but once mod_rewrite attempts to redirect the URL internally, it seems like Apache doesn't want to parse out the trailing part of the URL, which results in the 404 error.

If you use $_SERVER['REQUEST_URI'] instead, then you can just rewrite to index.php without the trailing information, like so:

RewriteCond $1 !^(index\.php|resources|robots\.txt)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php [L,QSA]

As far as the 400 error goes, your % should be encoded as %25 by urlencode(), but it sounds like for whatever reason there might be an issue. I'd check to make sure that your search URLs are indeed being properly encoded in the output sent to the browser, as this may be related to issues with the other remaining characters as well (but I'm not sure).

Edit: If you used the rerwite above, you'd have URLs like

http://project_name/browse_by_exam/type/tutor_search/keyword/class/new_search/1/search_exam/0/search_subject/0

and they would be internally redirected to index.php. Then, you could get the part

/browse_by_exam/type/tutor_search/keyword/class/new_search/1/search_exam/0/search_subject/0

from $_SERVER['REQUEST_URI'] in that script (it would contain this value) which you could then parse like you're doing now. I'm not sure why you have to be able to have it rewritten after the index.php, since you can get this information even if it isn't, and it looks the exact same to the user in their browser. You could even do this at the beginning of the script, if the part that uses $_SERVER['PATH_INFO'] is not available for changing:

$_SERVER['PATH_INFO'] = $_SERVER['REQUEST_URI'];

If you really can't do it like this, I'm not sure that there is a solution (there was an explanation in your other question on why this is problematic), but I'll look to see if it's at all possible and get back to you.

Tim Stone
Thanks a lot Tim, this immediately solved the % character issue. And I further corrected my code so that all the other characters are not getting removed. But the / character is still breaking my URL like before (404 error). Please check the Updates section and see if you can help.
sandeepan
I made some updates to my answer; it doesn't fix the problem that you describe, but I don't get why it has to be done the way you describe. I'll see if there's a way to make it work like you want, though, but I'm not sure if it's possible (unless you move your rewrite rules to `httpd.conf` or something, where it seems to work out alright).
Tim Stone
I want to keep the index.php hidden, like it has been always in our project. I guess that hides the language informaton on which the coding is done (php in my case). Check my edited questionReally appreciate your helping attitudeThanks,Sandeepan
sandeepan
Oh, I understand. Using the approach that I suggested though, you can still keep the `index.php` out of the URL. The changing to `index.php` happens on the server, and the user who sees the URL in the browser should not know that this happens at all, so if you modify how this is done it still keeps your URLs the same (you just get that URL data from a different place in your PHP script). I can try to explain more clearly if you want, or if you're happy with your current solution, I am glad to see that you're able to get it to work.
Tim Stone
please explain if you believe something better can be done...we should always strive to get best solutionscheers Sandeepan
sandeepan