views:

607

answers:

5

I am using tinyMCE as my text editor on my site and i want to reformat the text before saving it to my database (changing the &rsquo; tags into ' then in to &#39;). I cannot find a simple way of doing this using tinyMCe and using htmlentities() changes everything including <>. Any ideas?

+7  A: 

You can user strip_tags($str, $allowed_tags) like below:

$txt = strip_tags($txt, '<p><a><br>');
Tomasz Tybulewicz
+1  A: 

Directly from the PHP manual: strip_tags()

The $allowable_tags variable allows you to define a string of allowed tags. You can use this optional second parameter to specify tags which should not be stripped.

Veynom
+1  A: 

tinyMCE allows you to specify a 'whitelist' of allowed tags, which will remove any tags not included on the list:

tinyMCE.init({
  ...  //  other init instructions
  valid_elements: 'p,a[href],br',
});

In our own project we combine this whitelist with an internal converter which turns the HTML into a BB-like format for the database, then back to HTML again when it needs to be printed to a page.


Update: Now that the question has been edited to be clearer, I can see that what I typed above doesn't solve the problem. What the questioner wants is a way to convert character entities while leaving HTML tags unaffected.

In our own project, the internal converter we use does this job. When converting from HTML into our internal representation, encoded characters are converted into the characters themselves; when converting back into HTML, higher characters are encoded. This is done in a character-by-character, parser-like style. However this approach is probably too complicated for your needs.

The shortcut used by many is to use a series of regular expressions, but you may find it difficult to arrange your regexes in such a way as to preserve ampersands & and semicolons ; at the same time as translating character entities &nbsp;. You'll also find that to cover every possible character entity you'd need dozens of regexes.

Uh, so I don't actually have an answer.

Marcus Downing
+1  A: 

That depends on the tags you want to preserve. I assume you want to use all features of TinyMCE so the text can include nexted tags like table constructs. Then there is no simple way of doing it (one way would be to use PHP Document Object Model to parse the html document.

But TinyMCE has serveral configuration options to for entity encoding. I would suggest you check out the configuration options entity_encoding , entities and encoding in the TinyMCE manual.

Joe Scylla
+1  A: 

Both TinyMCE and FCK have tons of configuration options. The documentation can be a pain to search, but worth the effort.

TinyMCE allows you to specify entity encoding using the 'entity_encoding' option. It can be specified when you create your editor. It might look something like this...

tinyMCE.init({
    entity_encoding: 'numeric'
});

This would change a tag like &rsquo; into &#39;.

John ODonnell