views:

85

answers:

6

WordPress spits posts in this format:

<h2>Some header</h>
<p>First paragraph of the post</p>
<p>Second paragraph of the post</p>
etc.

To get my cool styling on the first paragraph (it's one of those things that looks good only sparingly) I need to hook into the get_posts function to filter its output with a preg_replace.

The goal is to get the above code to look like:

<h2>Some header</h>
<p class="first">First paragraph of the post</p>
<p>Second paragraph of the post</p>

I have this so far but it's not even working (the error is: "preg_replace() [function.preg-replace]: Unknown modifier ']'")

$output=preg_replace('<p[^>]*>', '<p class="first">', $content);

I can't use CSS3 meta-selectors because I need to support IE6, and I can't apply the :first-line meta-selector (this is one that IE6 supports) on the parent container because it would hit the H2 instead of the first P.

A: 

Only with javascript

Denis Bobrovnikov
+1  A: 

The problem is that the first character of the regex in a preg_* function is taken as a modifier delimiter. What you'd need is something like:

$output = preg_replace('~<p\b([^>]*)>~', '<p class="first" \1>', $content, 1);

This also puts back any extra attributes the <p> may have.

Overall, though, it's cleaner to do with CSS selectors and a JS fallback for IE.

EDIT: Added replacement limit and word break.

Max Shawabkeh
+6  A: 

You may find it easier and more reliable to use an HTML parser such as this one. HTML is notoriously difficult to parse reliably (technically, impossible) with regular expressions, and the parser will give you a very simple means to find the nodes you're interested in. The first page of the doc has a tab labelled "How to modify HTML elements".

Brian Agnew
+1  A: 

Two right possibilities :

  1. Do that in Javascript. Using jQuery, for example, it's a matter of one line : $("h2").next().addClass("first")
  2. Use an HTML parser. Indeed, regexp are not a good tool to do what you want to do. Since loading a whole HTML parser for just this purpose is overkill, you'd really better be using Javascript.

The wrong way

Of course, in order to anwser the question, here is the best way I can't think of to make it happends with regexp. Though, I don't recommend it.

preg_replace('#(</h2>\s*<p[^>]*)>#im', '$1 class="first">', '<h2>Some header</h> <p>First paragraph of the post</p> <p>Second paragraph of the post</p> ');

What we do is:

  • using preg_replace so we can use advanced regexp to replace the code;
  • using "m" and "i" flag so the regexp does not bother about line break or case;
  • using </h2>\s* to match the closing "h2" tags and all the spaces/line breaks after;
  • using *<p[^>]* to match the "p" tag and its current attributs;
  • using parenthesis to save that;
  • using "$1" to replace to replace the matched string we the part we save;
  • adding the class and closing the ">".

The first draw back I can think of is that it doesn't handle the case where a class already exists.

Of, and by the way, you have <h2>...</h> instead of <h2>...</h2>. I don't know if it's a typo but I assumed it was. Replace in the regexp accordingly if it's not.

e-satis
Woops! Yeah the <h2>...</h> is a typo. I don't have to worry about malformed HTML since the blog engine is generating it.
Rod Boev
+1  A: 

in this particular case regexp solution would be fairly easy

echo preg_replace('~</h2>\s*<p~', "$0 class='first'", $html);
stereofrog
A: 

Reading through the answers there are some that will work but all have drawbacks of either using an external parsing library or possibly matching tags other than the P tag or also matching its attributes.

I ended up using this solution with the str_replace_once function from here:

str_replace_once('<p>', '<p class="first">', $content);

Simple enough and it works just as intended. Here's full WordPress code snippet to filter the first paragraph any time the_content() is called:

add_filter('the_content', 'first_p_style');
function first_p_style($content) {
 $output=str_replace_once('<p>', '<p class="first">', $content);
 return ($output);
}

Thanks for all the answers!

Rod Boev