views:

58

answers:

3

I have an inline markup editor built into my website, which should produce XHTML compliant markup. But as you can see, it uses the deprecated font tag and size attribute.

<font style="font-family: Courier New; color: rgb(0, 0, 153);" size="2">
   asdfa
   <span style="color: rgb(0, 51, 0);">
    a
    <font size="5">fds</font>
   </span>
</font>

On other browsers, it produces the <span class="Apple-style-span" style="font-size: xx-large;"> instead of <font size="5">

Is there a Javascript/Regex solution to taking the first set of markup and replacing it with XHTML compliant markup using style attribute and span tag. Thanks in advance!!

(ps. jQuery can be used too)

A: 

Check out CKEDITOR if it's an option to implement an other WYSIWYG Editor in your application.

Jan.
CK Editor, TinyMCE both don't fill our requirements
Emile
+1  A: 

I wouldn't recommend REGEX for that sort of job. (see: the greatest 'Regex to Parse HTML' answer ever!) I know, you're not talking about a full-on parser, but I still think you'd be best off with JavaScript (or which ever back-end language you're using) and a library tailored to parsing html.

You may want to look at the Tidy open source project over on Sourceforge. There's an intro/overview at IBM: "Convert from HTML to XML with HTML Tidy".

S.Jones
+1 Thanks for the Tidy link!
Emile
+2  A: 

The markup above is perfectly valid in XHTML 1.0 Transitional.

Whether deprecated elements like <font> are used are a completely orthogonal issue to whether XHTML or HTML syntax is used. XHTML 1.0 is nothing more or less than a restating of HTML 4.01 in XML syntax: consequently there are Transitional and Strict variants just as there are for HTML 4.

<font size="5"> and <span class="Apple-style-span" style="font-size: xx-large;"> are semantically equally useless. If you want markup to use a set of defined elements and classes that are meaningful in the context of your site, you'll have to hack the editor into using those, instead of being based purely on visual formatting.

You could parse the XHTML and alter it as a later step, to try to make it look better. But regex is not at all an adequate tool to do so, as previously mentioned. You would need an XML parser, then you'd fix up the elements and attributes, then re-serialise it to XHTML. It would be sensible to do this on the server-side, because getting an XML parser on the client-side is slightly tricky, and you will need to do it on the server side anyway if you're going to be cleaning non-whitelisted elements and attributes.

bobince
Thanks for the thorough explanation!
Emile