tags:

views:

77

answers:

6

I really don't know what my problem is lately, but Regex seems to be giving me the most trouble.

Very simple thing I need to do, but can't seem to get it:

I have a uri that returns either /xmlfeed or /xmlfeed/what/is/this

I want to match /xmlfeed on any occasion.

I've tried many variations of the following:

preg_match('/(\/.*?)\/?/', $_SERVER['REQUEST_URI'], $match);

I would read this as: Match forwardslash then match any character until you come to an optional forwardslash.

+1  A: 

In PHP: '/(\/.*?)\/?/' is a string containing a regular expression.

First you have to decode the string: /(/.*?)\/?/

So you have a forward slash that starts the result expression. An opening brace. A forward slash that ends the matching part of the expression … and I'm pretty sure that it will then error since you haven't closed the brace.

So, to get this working:

  • Remember to escape characters with special meanings in strings and regular expressions
  • Don't confuse the forward slash / with the backslash \

You want to match everything after and including the first slash, but before any (optional) second slash (so we don't want the ? that makes it non-greedy):

/(\/[^\/]*)/

Which, expressed as a PHP string is:

'/(\\/([^\\/]*)/'
David Dorward
hrm, sorry about that, my line of code did have a backslash to escape the forward slash, but seems that SO parsed it out.
Senica Gonzalez
A: 

I know this is avoiding the regex, and therefore avoids the question, but how about splitting the uri (at slashes) into an array.

Then you can deal with the elements of the array, and ignore the bits of the uri you don't want.

pavium
+2  A: 

why do you need regex that make you confused??

$string = "/xmlfeed/what/is/this";
$s = explode("/",$string,3);
print "/".$s[1]."\n";

output

$ php test.php
/xmlfeed
ghostdog74
Part of it is just the learning.
Senica Gonzalez
I agree. Regex is a very time-consumming operation. if you can avoid it then you should.
denica001
+2  A: 

Why not:

preg_match ('#/[^/]+#', _SERVER['REQUEST_URI'], $match);

?

$match[0] will give you what you need

K Prime
Works. I missed that plus sign at first. :)
Senica Gonzalez
A: 

Using the suggestions posted, I ended up trying this:

echo $_SERVER['REQUEST_URI'];
preg_match("/(\/.*)[^\/]/", $_SERVER['REQUEST_URI'], $match);
$url = "http://".$_SERVER['SERVER_NAME'].$match[0];
foreach($match as $k=>$v){
  echo "<h1>$k - $v</h1>";
}

I also tried it without the .* and without the parentheses.

Without the .* AND () it returns the / with the next character ONLY.

Like it is, it just returns the entire URI everytime

So, when ran with the code above, the output is

/tea-time-blog/post/20 0 - /tea-time-blog/post/20 1 - /tea-time-blog/post/2

This code is being eval()'d by the way. I don't think that should make any differnce in the way PHP handles the regular expression.

Senica Gonzalez
Err. So this **doesn't** solve your problem? Please edit your question instead of "answering" it. (And eval() does make a difference, since you can end up with *another* layer of string escaping).
David Dorward
"I also tried it without the .* and without the parentheses." So you ended up with `/[^/]`, right? It should have been `/[^/]+` or `/[^/]*`; you still need a quantifier, just not a *reluctant* quantifier. And @David's right: this information should have been added to your question, not posted as an answer.
Alan Moore
+1  A: 

Your problem is the reluctant quantifier. After the initial slash is matched, .*? consumes the minimum number of characters it's allowed to, which is zero. Then /? takes over; it doesn't see a slash in the next position (which is immediately after the first slash), but that's okay because it's optional. The result: the regex always matches a single slash, and group #1 always matches an empty string.

Obviously, you can't just replace the reluctant quantifier with a greedy one. But if you replace the .* with something that can't match a slash, you don't have to worry about greediness. That's what K Prime's regex, '#/[^/]+#' does. Notice as well how it uses # as the regex delimiter and avoids the necessity of escaping slashes within the regex.

Alan Moore