views:

354

answers:

4

The State of contenteditable

The markup produced by wysiwyg editors is atrocious. Its littered with style attributes and often is not even valid HTML (missing closing tags, overlapping styling, etc). Are there any established solutions to the problems introduced by contenteditable? If not, how can I go about creating one?

Invalid HTML

For instance, after entering an unordered list followed by a line of text into a popular rich text editor I am presented with the following HTML:

<ul><li>foo</li><li>bar</li></ul><div>baz

Where did the div tag come from? Why doesn't it get closed? Why not a paragraph - properly closed, of course? Is this preventable?

Non-Semantic HTML

Another editor produced the following:

<p style="font-size:11pt; line-height:115%; margin:0pt 0pt 0pt 36pt; text-indent:-18pt"><span style="color:#000000; font-family:Arial; font-size:11pt; font-style:normal; font-weight:normal; text-decoration:none">●</span><span style="font:7.0pt &#39;Times New Roman&#39;">     </span><span style="color:#000000; font-family:Arial; font-size:11pt; font-style:normal; font-weight:normal; text-decoration:none">Lorem</span></p><p style="font-size:11pt; line-height:115%; margin:0pt 0pt 0pt 36pt; text-indent:-18pt"><span style="color:#000000; font-family:Arial; font-size:11pt; font-style:normal; font-weight:normal; text-decoration:none">●</span><span style="font:7.0pt &#39;Times New Roman&#39;">     </span><span style="color:#000000; font-family:Arial; font-size:11pt; font-style:normal; font-weight:normal; text-decoration:none">ipsum</span></p>

This is valid markup, but it's semantically horrible! Instead of a styled unordered list, the editor has inserted literal bullet characters. As a developer, I have no idea what to do with the markup contenteditable has produced. As far as styling goes, it's utterly useless.

Substitutions

Up until recently, I have been using Markdown as a tool for generating semantically correct HTML from users. However, Markdown lacks certain features that my users are asking for (ease of use for non-techies, image positioning, etc). After looking for a wysiwyg editor that will produce valid, semantic markup for several weeks, I've found that contenteditable makes this impossible. Is someone somewhere talking about solutions to these issues? How can I get involved.

[Update] Clarification

I suppose I wasn't completely clear up front. I'm not asking for ways to get around contenteditable's problems. I've found lots of those, including most of the ones mentioned below. What I'm trying to find is the root cause of these issues. Why is contenteditable so broken? Is it due to technical problems or legacy code issues? Moreover, what is being done to fix it and where is this work being done?

[Update] Google Docs

The HTML in the second example above came from the Google Docs Editor. However, as AlfonsoML points out below, Google Docs does not use contenteditable in their editor. Their technique is explained here.

A: 

This is a difficult question that everyone solves differently.

I know about the following possibilities:
* Just let it go (the worst one)
* Try to fix the HTML by yourself (TinyMCE and CKEditor try to do that with various success rate)
* Use a plugin like XStandard
* Use DOM tom make your own editor - make a caret out of a DIV, handle selection, editing and just everything by yourself. This is how Google does it for example. This needs a lot of coding though.

dark_charlie
Thanks for the answer. I've never heard of XStandard, but I'll definitely check it out. As for making your own editor, the second example above is from Google Docs - even though their editor is nice the content it spits out is laughable.
brad
+1  A: 

For browsers that support contenteditable, a solution can be whipped up fairly easily; using e.g. jQuery to save the data, and to attach new elements to the editable content (with buttons, obviously) one gets a simple WYSIWYG editor with very little effort.

One problem that might be a little harder to solve is ensuring the placement of these new elements is correct. However, this has been solved for Webkit, and I'm sure it's possible to do in other browsers as well.

An editor using contenteditable would be both easily deployed and lightweight, do it's definitely something worth developing, IMHO.

Edit: After re-reading your question, I realize you were actually looking for a working non-contenteditable solution. I'm not sure why; AFAIK contenteditable produces valid markup, and if you take full control (only allowing users to insert elements and change their properties through your UI), it will be semantic as well. AFAIK no current WYSIWYG editors use contenteditable, but I could be wrong.

You
I agree, but how do I ensure what comes out of the editor is valid and semantic?
brad
Making sure it is valid shouldn't be needed, although one could validate it on save and return an error if it's invalid. The semanticity can, to a certain extend, be ensured by forcing the user to use your UI. It won't stop users from making lists out of paragraphs and such, but things like that can be "fixed" quite easily before saving (assuming the user adheres to sensible "standards", e.g. uses `*` to denote lists).
You
A: 

I think that your problem is trying to use a tool like Google Docs as stated in a previous comment instead of a tool that it's really focused on editing HTML.

Yes, both kind are WYSIWYG, but I don't think that the aim of GDocs is to produce nice HTML, instead they are focused on providing an alternative to MS Word, even if they have to go and create strange HTML along the way.

I can't really believe that you have been researching this for several weeks and you aren't already using CKEditor or TinyMCE. Both solutions include the options to use attributes, classes or styles for the output, it's just a matter to look at the samples, read a little the docs and adjust it to your needs.

AlfonsoML
I'm not trying to use Google Docs as an HTML editing tool. It was just an example of the output that is generated by contenteditable.
brad
The fact is that Google Docs isn't using contentEditable: http://googledocs.blogspot.com/2010/05/whats-different-about-new-google-docs.html
AlfonsoML
+3  A: 

I suspect it's in the state it's in for several specific reasons:

1.) The HTML5 standard that defines it does not define what it should output, leaving every implementation to decide for themselves.

2.) The editor doesn't know whether you want an accurate representation or a semantic one. These are often trade-offs.

3.) It looks like Mozilla dragged it's old Netscape Mail/Composer code into its designMode implementation then dragged that into contentEditable. It's still using the same dotted outlines and little red handles that existed 10 years ago and I suspect a huge amount of legacy code.

4.) It looks like Internet Explorer did pretty much the same, and IE never did anything properly in the first place. It's still dragging its legacy along in the form of quirks and "compatibility" modes.

5.) It's really only used in CMS systems and forum/comment boxes, and usually these have fairly heavy wrappers around the implementation. I haven't seen much usage elsewhere, especially 'bare-bones' ones.

6.) There are few successful online word-processors and fewer still WSIYWYG or page layout applications. The pressure to create accurate editors just isn't there. Worse still Microsoft sell offline software like Word and Expressions that would definitely be harmed by the growth on online editors, especially high-quality free ones.

7.) Microsoft has a legacy of creating tools that generate invalid and bloated HTML code that still exists today in Outlook 2010 and the Office suite.

8.) Sometimes the same selection and editing command could be applicable to several possible objects occupying the same space and the editor can't ever really know which one you meant to alter. Also an editing command could cause an element to split in an undefined or unexpected way (again, it won't know which you would prefer).

I don't believe contentEditable will ever be as good as cutting your own HTML or using a dedicated tool. I'd just focus on running some basic cruft/repair tools (like HTMLTidy) over the result and call it a day.

SpliFF