tags:

views:

51

answers:

3

How would I replace all span tags (and whatevers inside them) that have the class pagenum pncolor with an empty line. str_replace wouldn't work because the name is different for all of them, so I assume I'd use preg_replace, but I'm sure how that works.

<span class='pagenum pncolor'><a id='page_001' name='page_001'></a>001</span>
<p>Some text</p>

<span class='pagenum pncolor'><a id='page_130' name='page_130'></a>130</span>
<p>Some text</p>
<p>Some text</p>
<p>Some text</p>

<span class='pagenum pncolor'><a id='page_120' name='page_120'></a>120</span>
<p>Some text</p>

<span class='pagenum pncolor'><a id='page_100' name='page_100'></a>100</span>
<p>Some text</p>
+2  A: 

Use this regexp: #<span class='pagenum pncolor'>.*?</span>#si

Crozin
A: 

assuming that $text = {THE_HTML_STRING_YOU_POSTED_IN_YOUR_QUESTION};

you can try:

preg_replace("/<span class='pagenum pncolor'>(.*)<\/span>/",'',$text);
andreas
That would replace everything between the first `<span>` to the last `</span>`. Put a `?` after your `*` to fix that.
Chad Birch
You are right! Making the quantifier LAZY will solve the problemPlease check http://www.regular-expressions.info/repeat.html that states that laziness will result in more CPU cycles due to backtracking.
andreas
+1  A: 

I'm going to mention the obligatory: You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML.

However, I'm guilty of using regexes in situations like this also... And if I were to do so, I'd use @andreas's answer.

Josh
+1 for entertaining reading
thetaiko