views:

60

answers:

3
<hr>I want to remove this text.<embed src="stuffinhere.html"/>

I tried using regex but nothing works.

Thanks in advance.

P.S. I tried this: $str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str)

A: 
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed.*?>)#', '$1$2', $text);

echo $text;

If you want to hard code src in embed tag:

$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed src="stuffinhere.html"/>)#', '$1$2', $text);

echo $text;
NAVEED
+3  A: 

You'll get a lot of advice to use an HTML parser for this kind of thing. You should do that.

The rest of this answer is for when you've decided that the HTML parser is too slow, doesn't handle ill formed (i.e. standard in the wild) HTML, or is a pain in the ass to integrate into the system you don't control. I created the following small shell script

$str = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str);
var_dump($str);

//outputs
string(35) "<hr><embed src="stuffinhere.html"/>"

and it did remove the text, so I'd check your source documents and any other PHP code around your RegEx. You're not feeding preg_replace the string you think you are. My best guess is your source document has irregular case, or there's whitespace between the <hr /> and <embed>. Try the following regular expression instead.

$str = '<hr>I want to remove 
this text.
<EMBED src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#si', '$1$2', $str);
var_dump($str);

//outputs
string(35) "<hr><EMBED src="stuffinhere.html"/>"

The "i" modifier says "make this search case insensitive". The "s" modifier says "the [.] character should also match my platform's line break/carriage return sequence" But use a proper parser if you can. Seriously.

Alan Storm
A: 

I think the code is self-explanatory and pretty easy to understand since it does not use regex (and it might be faster)...


$start='<hr>';
$end='<embed src="stuff...';
$str=' html here... ';

function between($t1,$t2,$page) {
 $p1=stripos($page,$t1);
 if($p1!==false) {
  $p2=stripos($page,$t2,$p1+strlen($t1));
 } else {
  return false;
 }
 return substr($page,$p1+strlen($t1),$p2-$p1-strlen($t1));
}

$found=between($start,$end,$str);
while($found!==false) {
  $str=str_replace($start.$found.$end,$start.$end,$str);
  $found=between($start,$end,$str);
}

// do something with $str here...
vlad b.