tags:

views:

703

answers:

9

I am a bit of a newbie to Reg Ex and don't fully understand the difference between the different flavors. However, I have a basic Reg Ex that works when I try it via the UNIX system (vi and grep) but not when I try to to use it in PHP's ereg functions. I suspect there is something different about the PHP ereg function that is making this not work:

<?php
$string = 'Feugiat <em>hendrerit</em> sit iriuredolor aliquam.';
$string = ereg_replace("<em\b[^>]*>(.*?)</em>","\\1",$string);
echo $string;
?>

I would like this to output Feugiat hendrerit sit iriuredolor aliquam. without the em tags. However it just returns an empty string.

+4  A: 

You may need to escape the backslash:

$string = ereg_replace("<em\\b[^>]*>(.*?)</em>","\\1",$string);

This is because \b in a PHP string means something different from a \b in a regular expression. Using \\ in the PHP string passes through a single backslash to ereg_replace(). This is the same reason you need to use double backslash in the replacement string "\\1".

Depending on your application, you may also want to consider the possibility that your input $string does not contain any <em> tags. In that case, the above statements would result in an empty string, which is probably not what you intend.

Greg Hewgill
Alternately, you could use single-quoted strings so that there's no interpretation of the contents of the string.
Sean McSomething
This doesn't work for me. See my new 'answer below'
bryan kennedy
Your answer is incorrect. \\1 is not a backreference in the preg_replace replacement text. $1 is.
Jan Goyvaerts
single quotes still use escape sequences (backslash), it just doesn't evaluate $
FryGuy
+1  A: 

If removing <em> tags is your intention, I would recommend the following:

<?php
  $string = 'Feugiat <em>hendrerit</em> sit iriuredolor aliquam.';
  $string = ereg_replace("</?em\\b[^>]*>", "", $string);
  echo $string;
?>

Greg Hewgill is right about the escaping of backslashes in a PHP string. You need to do it to get a literal backslash into your regex pattern string.

Tomalak
+2  A: 

If all you're using the regular expression for is to remove the html tags, perhaps php's strip_tags() function would be more appropriate.

php.net manual entry

Eric
Yeah, totally, but I was more just bringing this up as an example, b/c I didn't understand how PHP uses RegEx. And I can imagine a situation where I only want to remove one specific tag.
bryan kennedy
+1  A: 

I have never understood ereg_ and always use preg. If you add the backslash like Greg suggests and change to preg_ it will compile.

$string = preg_replace('%<em\\b[^>]*>(.*?)</em>%','\\1',$string);

Edit: I agree with others here that this particular approach might not be ideal for the problem. But still, preg_ is most often the way to go when using regexes in PHP.

PEZ
Your answer is incorrect. \\1 is not a backreference in the preg_replace replacement text. $1 is.
Jan Goyvaerts
Yeah, but somehow it still works. Maybe they support both?
PEZ
+2  A: 

ereg_replace does not support the word boundary assertion (\b) or non-greedy modifier (*?). PEZ is right, you should probably be using preg.

preg_replace('!<em\\b[^>]*>(.*?)</em>!', '$1', $string)

The extra backslash is not strictly necessary because PHP does not replace \b, but it is a good idea to always escape backslashes in a string literal.

mcrumley
Why did you use the ! instead of the % to encapsulate the preg? Is there an advantage to this or is it just preference? I have the same question about your use of $1 vs \\1
bryan kennedy
They're both personal preference, but most regex flavors use the $1 form for capture-group references in the replacement string, so you should favor that form. You can also use ${1} to avoid collisions with other digits in the replacement string.
Alan Moore
+2  A: 

It's probably a good idea to avoid ereg for future compatibility. It looks like it's been depreciated in php6 according to this.

The ereg extension, which supports Portable Operating System Interface (POSIX) regular expressions, is being removed from core PHP support.

charlesbridge
+1  A: 

Greg Hewgill's answer seemed right at first but when I try and use this code from that answer:

<?php 
$string = 'Feugiat <em>hendrerit</em> sit iriuredolor aliquam.';
$string = ereg_replace("<em\\b[^>]*>(.*?)</em>","\\1",$string);
echo $string;
?>

I get this error:

Warning: ereg_replace() [function.ereg-replace]: REG_BADRPT in test.php on line 3

So that ereg pattern doesn't seem to work just yet.

bryan kennedy
Perhaps you could try preg_replace instead, as one of the other comments noted that ereg_replace will be deprecated.
Greg Hewgill
+1  A: 

PHP's ereg functions use a very limited regex flavor called POSIX ERE. My flavor comparison indicates all that this flavor lacks compared with modern flavors.

In your case, the word boundary \b is not supported. A strict POSIX implementation will flag \b as an error.

Your solution is to use the preg functions instead:

preg_replace('!<em\b[^>]*>(.*?)</em>!', '$1', $string);

Compared with other answers you've received: Don't escape the backslash in \b, and use $1 for the replacement. preg_replace uses a different replacement text syntax than ereg_replace.

Jan Goyvaerts
+1  A: 

ereg doesn't handle the \b boundary stuff, as far as I know, while the preg does. Also, I think the double quoting on the regex may cause problems with the backslashs

Calyth