views:

56

answers:

2

We run some large directories where users often copy/paste content from word documents etc into our TinyMCE html editor.

The problem with this is often the following text for example gets hidden there which shows up on our webpages:

<!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; mso-layout-grid-align:none; punctuation-wrap:simple; text-autospace:none; font-size:10.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} a:link, span.MsoHyperlink {color:blue; text-decoration:underline; text-underline:single;} a:visited, span.MsoHyperlinkFollowed {color:purple; text-decoration:underline; text-underline:single;} p {mso-margin-top-alt:auto; margin-right:0in; mso-margin-bottom-alt:auto; margin-left:0in; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.Section1 {page:Section1;} -->

Is there a TinyMCE plugin or some other cross browser html editor that automatically strip this out?

Or another solution would be some php regex command or something that could strip out these comment declarations.

+1  A: 

The PHP regex command is what I'd use, personally.

$str = preg_replace('/<!--.*?--\>/','',$str);
Zarel
perfect, will give this a shot
Joe
+2  A: 

I've been trying to optimize on that one for years.

My best solution so far goes like this:

  • don't use a root block, the layout will implement the root layout
  • don't expect the user to understand the difference between <p> and <br /> and therefor treat everything as a simple break as it is less confusing and more ms-word-like
  • only allow expected elements

This would be the init code.

remove_linebreaks : false,
force_br_newlines : true, <?php /* maybe we can behave more like gmail */ ?>
force_p_newlines : false,   <?php /* and preserve all message line breaks */ ?> 
convert_newlines_to_brs : false, <?php /* even so i would not count with it */ ?>
forced_root_block : false

<?php /* explicitly define what will be allowed */ ?>
valid_elements: "h1,h2,h3,br,b,a,i,u,strong/b,em/i,u/span,strike/span,span,span[style],"+
                "sub,sup,a[href|name|anchor|target|title],ul,ol,li,p,object[classid|width|height|codebase|*],"+
                "param[name|value|_value],embed[type|width|height|src|*],"+
                "img[style|longdesc|usemap|src|border|alt=|title|hspace|vspace|width|height|align]",

And then I have the following post-process function to remove all <p> and convert all </p> to <br /><br /> this has been the most stable copy-paste solution I've been able to dev.

This is the post-process function:

setup : function(ed) {
    ed.onPostProcess.add(function(ed, o) {
        // Remove all paragraphs and replace with BR
        o.content = o.content.replace(/<p [^>]+>|<p>/g, '');
        o.content = o.content.replace(/<\/p>/g, '<br />');
    });
},

Do notice that all this is just Javascript filtering and the user will be able to pass all that non-desired code to the server in a snap. Even though this setup is probably intended for end-admin setups also use strip_tags on the server side as someone somewhere will probably be able to by-pass-it.

Hope it helps!

Frankie
bingo! thanks a lot for this!
Joe