tags:

views:

21

answers:

2

Before saving into database i need to


  1. delete all tags
  2. delete all more then one white space characters
  3. delete all more then one newlines

for it i do the following

  1. $content = preg_replace('/<[^>]+>/', "", $content);
  2. $content = preg_replace('/\n/', "NewLine", $content);it's for not to lose them when deleting more then one white space character

    $content = preg_replace('/(\&nbsp\;){1,}/', " ", $content);

    $content = preg_replace('/[\s]{2,}/', " ", $content);

  3. and finnaly i must delete more then one "NewLine" words.

after first two points i get text in such format-

NewLineWordOfText
NewLine
NewLine
NewLine NewLine WordOfText &quot;WordOfText WordOfText&quot; WordOfText NewLine&quot;WordOfText
...

how telede more then one newline from such content?

Thanks

+3  A: 

First of all, while HTML is not regular and thus it is a bad idea to use regular expressions to parse it, PHP has a function that will remove tags for you: strip_tags

To squeeze spaces while preserving newlines:

$content = preg_replace('/[^\n\S]{2,}/', " ", $content);
$content = preg_replace('/\n{2,}/', "\n", $content);

The first line will squeeze all whitespace other than \n ([^\n\S] means all characters that aren't \n and not a non-whitespace character) into one space. The second will squeeze multiple newlines into a single newline.

Daniel Vandersluis
+1. That question should be required reading for users writing their first question tagged "html."
Matt Ball
+1 for strip_tags. HTML isn't regular and is a pain to try to parse it.
Platinum Azure
A: 

why don't you use nl2br() and then preg_replace all <br /><br />s with just <br /> then all <br />s back to \n?

Thomas Clayson