views:

2085

answers:

4

What's the best/most efficient way to extract text set between parenthesis? Say I wanted to get the string "text" from the string "ignore everything except this (text)" in the most efficient manner possible.

So far, the best I've come up with is this:

$fullString = "ignore everything except this (text)";
$start = strpos('(', $fullString);
$end = strlen($fullString) - strpos(')', $fullString);

$shortString = substr($fullString, $start, $end);

Is there a better way to do this? I know in general using regex tends to be less efficient, but unless I can reduce the number of function calls, perhaps this would be the best approach? Thoughts?

+10  A: 

i'd just do a regex and get it over with. unless you are doing enough iterations that it becomes a huge performance issue, it's just easier to code (and understand when you look back on it)

$text = 'ignore everything except this (text)';
preg_match('#\((.*?)\)#', $text, $match);
print $match[1];
Owen
Isn't *? redundant?
Dimitry Z
No, it isn't: . only matches a single character.
Edward Z. Yang
not necessarily, ? is a lazy match. without it, a string like 'ignore (everything) except this (text)', the match would end up being 'everthing) except this (text'
Owen
Good to know. Should avoid all those squared nots. E.g. /src="([^"]*)"/ now replaced with /src="(.*?)"/ :D
Dimitry Z
It's good that you can "understand when you look back on it". Failing that, you've got some Stack Overflow comments to clarify it.
Mnebuerquo
the /src="([^"]*)"/ is more efficient than /src="(.*?)"/
Tanj
Tanj, what makes you say that?
Edward Z. Yang
ya square nots are, the reason is ? makes the engine backtrack a lot, which is very expensive. the square nots will match "forward" in that sense. i prefer the ? notation though, so if performance isn't an issue i get lazy :)
Owen
+1  A: 

So, actually, the code you posted doesn't work: substr()'s parameters are $string, $start and $length, and strpos()'s parameters are $haystack, $needle. Slightly modified:

$str = "ignore everything except this (text)";
$start  = strpos($str, '(');
$end    = strpos($str, ')', $start + 1);
$length = $end - $start;
$result = substr($str, $start + 1, $length - 1);

Some subtleties: I used $start + 1 in the offset parameter in order to help PHP out while doing the strpos() search on the second parenthesis; we increment $start one and reduce $length to exclude the parentheses from the match.

Also, there's no error checking in this code: you'll want to make sure $start and $end do not === false before performing the substr.

As for using strpos/substr versus regex; performance-wise, this code will beat a regular expression hands down. It's a little wordier though. I eat and breathe strpos/substr, so I don't mind this too much, but someone else may prefer the compactness of a regex.

Edward Z. Yang
A: 

http://php.net/manual/en/function.explode.php#71808

Two years ago I've posted that. It's been a while.

orlandu63
So, using explode does work, but since you have to do it twice and you get all the extra overhead of allocating arrays you're not really interested in, I wouldn't really recommend it.
Edward Z. Yang
+1  A: 

Use a regular expression:

if( preg_match( '!\(([^\)]+)\)!', $text, $match ) )
    $text = $match[1];
Rob