views:

126

answers:

3

Please consider the following very rudimentary "controllers" (functions in this case, for simplicity):

function Index() {
    var_dump(__FUNCTION__); // show the "Index" page
}

function Send($n) {
    var_dump(__FUNCTION__, func_get_args()); // placeholder controller
}

function Receive($n) {
    var_dump(__FUNCTION__, func_get_args()); // placeholder controller
}

function Not_Found() {
    var_dump(__FUNCTION__); // show a "404 - Not Found" page
}

And the following regex-based Route() function:

function Route($route, $function = null)
{
    $result = rtrim(preg_replace('~/+~', '/', substr($_SERVER['PHP_SELF'], strlen($_SERVER['SCRIPT_NAME']))), '/');

    if (preg_match('~' . rtrim(str_replace(array(':any', ':num'), array('[^/]+', '[0-9]+'), $route), '/') . '$~i', $result, $matches) > 0)
    {
        exit(call_user_func_array($function, array_slice($matches, 1)));
    }

    return false;
}

Now I want to map the following URLs (trailing slashes are ignored) to the corresponding "controllers":

/index.php -> Index()
/index.php/send/:NUM -> Send()
/index.php/receive/:NUM -> Receive()
/index.php/NON_EXISTENT -> Not_Found()

This is the part where things start to get tricky, I've two problems I'm not able to solve... I figure I'm not the first person to have this problem, so someone out there should have the solution.


Catching 404's (Solved!)

I can't find a way to distinguish between requests to the root (index.php) and requests that shouldn't exist like (index.php/notHere). I end up serving the default index.php route for URLs that should otherwise be served a 404 - Not Found error page. How can I solve this?

EDIT - The solution just flashed in my mind:

Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route('/:any', 'Not_Found'); // use :any here, see the problem bellow
Route('/', 'Index');

Ordering of the Routes

If I set up the routes in a "logical" order, like this:

Route('/', 'Index');
Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route(':any', 'Not_Found');

All URL requests are catched by the Index() controller, since the empty regex (remember: trailing slashes are ignored) matches everything. However, if I define the routes in a "hacky" order, like this:

Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route('/:any', 'Not_Found');
Route('/', 'Index');

Everything seems to work like it should. Is there an elegant way of solving this problem?

The routes may not always be hard-coded (pulled from a DB or something), and I need to make sure that it won't be ignoring any routes due to the order they were defined. Any help is appreciated!

+1  A: 

Okay, I know there's more than one way to skin a cat, but why in the world would you do it this way? Seems like some RoR approach to something that could be easily handled with mod_rewrite

That being said, I rewrote your Route function and was able to accomplish your goal. Keep in mind I added another conditional to catch the Index directly as you were stripping out all the /'s and that's why it was matching the Index when you wanted it to match the 404. I also consolidated the 4 Route() calls to use a foreach().

function Route()
{
        $result = rtrim(preg_replace('~/+~', '/', substr($_SERVER['PHP_SELF'], strlen($_SERVER['SCRIPT_NAME']))), '/');
        $matches = array();

        $routes = array(
                'Send'      => '/send/(:num)',
                'Receive'   => '/receive/(:num)',
                'Index'     => '/',
                'Not_Found' => null
        );

        foreach ($routes as $function => $route)
        {
                if (($route == '/' && $result == '')
                        || (preg_match('~' . rtrim(str_replace(array(':any', ':num'), array('[^/]+', '[0-9]+'), $route)) . '$~i', $result, $matches) > 0))
                {
                        exit(call_user_func_array($function, array_slice($matches, 1)));
                }
        }

        return false;
}

Route();

Cheers!

joshtronic
Like I said in my post this is only an example, in real world there are many more routes and almost all of them use class controllers, `mod_rewrite` can't do that. I appreciate the effort, but I've solved the missing 404 error with another approach (see my OP) - but thanks anyway! Do you happen to have a solution for the order routes are defined?
Alix Axel
@joshtronic: How did you design your gravatar?
Alix Axel
Face Your Manga: http://faceyourmanga.com/I liked it so much, I incorporated it into my website's design :)
joshtronic
@joshtronic: Thanks, I'm using it now as my avatar. There are still some folks here that have a similar, but pixel-looking, avatar. I'd like to know how they do that.
Alix Axel
right on! link me to what you're talking about, I'm interested in seeing it... it's about time for me to update my avatar :)
joshtronic
+1  A: 

This is a common problem with MVC webapps, that is often solved before it becomes a problem at all.

The easiest and most general way is to use exceptions. Throw a PageNotFound exception if you don't have a content for given parameters. At the top level off your application, catch all exceptions like in this simplified example:

index.php:

try {
    $controller->method($arg);
} catch (PageNotFound $e) {
    show404Page($e->getMessage());
} catch (Exception $e) {
    logFatalError($e->getMessage());
    show500Page();
}

controller.php:

function method($arg) {
    $obj = findByID($arg);
    if (false === $obj) {
         throw new PageNotFound($arg);
    } else {
         ...
    }
}

The ordering problem can be solved by sorting the regexes so that the most specific regex is matched first, and the least specific is matched last. To do this, count the path separtors (ie. slashes) in the regex, excluding the path separator at the beginning. You'll get this:

 Regex           Separators
 --------------------------
 /send/(:num)    1
 /send/8/(:num)  2
 /               0

Sort them by descending order, and process. The process order is:

  1. /send/8/(:num)
  2. /send/(:num)
  3. /
jmz
That is great in a non-regex context, but I'm talking about regex routes. Anyway, the 404 problem is solved, can you suggest something to solve the ordering issue?
Alix Axel
My solution has nothing to do with regular expression. It's application logic; it doesn't care wether or not you use regexes to handle the requests. The idea is to catch an exceptions in the topmost application script, and react to them.Your ordering problem is only a problem because you don't keep track of rule scope. /send/8/foo is more specific than /send/(:num) which is more specific than /send which is more specific than /.
jmz
@jmz: I've successfully implemented the logic you mentioned before, that's not my problem. =) I've also found a solution to handle the 404 errors with the usage of the `:any` regex pattern as you can see in my edited question. Regarding the specificity and ordering of the routes, do you have any ingenious solution?
Alix Axel
Your answer has not helped much with my specific problem, but since it's the best one so far I'm going to award you the bounty, even though I won't accept/upvote it just yet.
Alix Axel
@Alix: In my previous comment I suggested you analyze the scope of each regex, e.g. by counting how many slashes it has, not counting a slash at the very end of the regex. Order the regexes by this scope, descending, and process in this order. The first matching regex will do the job.
jmz
@jmz: That makes perfect sense, but how can I order the routes *automatically* AND still use `exit()` when a route is matched? I don't see any solution, maybe it ain't possible. See also this answer (http://stackoverflow.com/questions/3023818/any-procedural-non-oo-php-framework/3245377#3245377), below **"Request routing features are cool, ..."** to understand why I want/need to use `exit()`.
Alix Axel
OK, I read your post and checked the phunction framework. So your design choice for `Route()` makes it impossible to order the routes in the framework. What you seriously need is documentation which -- judging by the open position -- you already know.And speaking of documentation, phunction framework would benefit from comments. It would be easier to read, use, and understand.
jmz
A: 

OK first of all something like:

foo.com/index.php/more/info/to/follow 

is perfectly valid and as per standard should load up index.php with $_SERVER[PATH_INFO] set to /more/info/to/follow. This is CGI/1.1 standard. If you want the server to NOT perform PATH_INFO expansions then turn it off in your server settings. Under apache it is done using:

AcceptPathInfo Off

If you set it to Off under Apache2 ... It will send out a 404.

I am not sure what the IIS flag is but I think you can find it.

Elf King
Have you at least read the whole question? I don't understand how this qualifies as an answer to any of my questions... Oh, and by the way, you can't always rely on `PATH_INFO`, see http://stackoverflow.com/questions/1884041/php-cgi-portable-and-safe-way-to-get-path-info/1916146#1916146.
Alix Axel