tags:

views:

157

answers:

3

i use nicEdit to write RTF data in my CMS. The problem is that it generates strings like this:

hello first line<br><br />this is a second line<br />this is a 3rd line

since this is for a news site, i much prefer the final html to be like this:

<p>hello first line</p><p>this is a second line<br />this is a 3rd line</p>

so my current solution is this:

  1. i need to trim the $data for <br /> at the start/end of the string
  2. replace all strings that have 2 <br/> or more with </p><p> (one single <br /> is allowed).
  3. finally, add <p> at the start and </p> at the end

i only have steps 1 and 3 so far. can someone give me a hand with step 2?

function replace_br($data) {
 # step 1
 $data = trim($data,'<p>');
 $data = trim($data,'</p>');
 $data = trim($data,'<br />');
 # step 2 ???
 // preg_replace() ?
 # step 3
 $data = '<p>'.$data.'</p>';
 return $data;
}

thanks!

ps: it would be even better to avoid specific situations. example: "hello<br /><br /><br /><br /><br />too much space" -- those 5 breaklines should also be converted to just one "</p><p>"

final solution (special thanks to kemp!)

function sanitize_content($data) {
    $data = strip_tags($data,'<p>,<br>,<img>,<a>,<strong>,<u>,<em>,<blockquote>,<ol>,<ul>,<li>,<span>');
    $data = trim($data,'<p>');
    $data = trim($data,'</p>');
    $data = trim($data,'<br />');
    $data = preg_replace('#(?:<br\s*/?>\s*?){2,}#','</p><p>',$data);
    $data = '<p>'.$data.'</p>';
    return $data;
}
A: 

This approach will solve your problem:

  1. Split the string on <br> or <br />: you'll get an array of strings.
  2. Create a new string <p>.
  3. Loop on the array of 1, from the beginning to the end and remove all entries that are empty, until an entry that is not empty (break).
  4. Same as 3, but from the end to the beginning of the array.
  5. Loop on the array of 1, have an integer value A (default 0), which states that there is a single or double break.
    1. If the string is empty, increase the value of A and continue the loop.
    2. If the string is not empty:
      1. If the value of A is 1 or below, append a <br>.
      2. If the value of A is 2 or above, append a </p><p>.
    3. Append the content of the current entry (which is not empty).
    4. Set the value of A to 0.
  6. Append </p>

A different approach: using Regular Expressions

(<br ?/?>){2,}

Will match 2 or more <br>. (See php.net on preg_split on how to do this.)

Now, the same approach on step 2 and 3: loop on the array twice, once from the beginning up (0..length) and once from the end down (length-1..0). If the entry is empty, remove it from the array. If the entry is not empty, quit the loop.

To do this:

$array = preg_split('/(<br ?/?>\s*){2,}/i', $string);

foreach($i = 0; $i < count($array); $i++) {
    if($value == "") {
        unset($array[$i]);
    }else{
        break;
    }
}

foreach($i = count($array) - 1; $i >= 0; $i--) {
    if($value == "") {
        unset($array[$i]);
    }else{
        break;
    }
}

$newString = '<p>' . implode($array, '</p><p>') . '</p>';
Pindatjuh
actually it would be even better if there was a way to find a string with 2 or more <br /> -- im thinking on preg_replace but still havent an idea on how to continue.
andufo
The first approach also handles those. The second approach is more easy to implement, but the question is whether you like to use RegEx on HTML (some people don't like that approach).
Pindatjuh
thanks for the pattern, but i think something is wrong. im using: $data = preg_replace('(<br ?/?>){2,}','aaa',$data); and it returns null. why? (im using "aaa" to make it more visible once applied)
andufo
Because you use `preg_replace`; you may ofcourse use that, but it will not work in the situation I sketched. I've also added some code.
Pindatjuh
A: 

I think this should work for step #2 unless I am not understanding your scenario completely:

$string = str_replace( '<br><br>', '</p><p>', $string );
$string = str_replace( '<br /><br />', '</p><p>', $string );
$string = str_replace( '<br><br />', '</p><p>', $string );
$string = str_replace( '<br /><br>', '</p><p>', $string );
OneNerd
thanks for the idea, but it is to basic. i need a more advanced approach. check out the final solution on the top.
andufo
+2  A: 

This will work even if the two <br>s are on different lines (i.e. there is a newline or any whitespace between them):

function replace_br($data) {
    $data = preg_replace('#(?:<br\s*/?>\s*?){2,}#', '</p><p>', $data);
    return "<p>$data</p>";
}
kemp
you are the man! hehe, preg_replace is the most efficient way to do this. thanks! i have to learn regexp better hehe.
andufo
What about `<br >`?
Gumbo
<br > is also being taken care when applying the " ?" in the regexp
andufo
@kemp ... i'm facing one more little detail with this solution. sometimes there is a <br /> next to a <p> or </p> -- how can i wipe them too in the same regexp?
andufo
Edited to account for `<br >`
kemp
@andufo: I'd rather not put too many things in a single regexp, splitting a problem in two steps often makes things **a lot** easier
kemp