tags:

views:

310

answers:

1

I'd like to create the typical preview paragraph with a [read more] link. Problem is, the content that I'd like to SubString() contains text and html, written by a user with a WYSIWYG editor.

Of course, I check to make sure the string is not null or empty, then SubString() it, problem is that I could end up breaking the html tags, throwing off the rendering of the entire site.

The WYSIWYG editor doesn't seem to create perfectly formatted HTML, and many times seems to use <br /> tags instead of <p></p>, etc... basically, I can't rely on well-formed tags, etc.

My workaround was to just strip out all HTML and substring the leftover text. This works, but loses any of the formatting that was in the HTML.

What's the best method of SubString()'ing a block of non-well-formed HTML while maintaining HTML that won't break the rendering of the site?

A: 

What about iterating through that substring searching for any not closed tags and saving these to a List, while removing any, which are closed? Then you could append the closing tags for any opened tags from the List (in reversed order), which would give you usable html...

Lukas
That was my first thought when I saw this question, but a naive implementation would break on even common HTML atrocities. For example, <i><b></i></b> would only work by accident at best. You'd have to backtrack to the last matching opening tag in the list, output the appropriate closing tags along the way. With this addition, your approach would produce valid HTML, but it might not be the HTML the user expected. On the other hand, web browsers are complex beasts, and you'll never replicate their error "handing" quirks. Simple is good, but you may still want to code for common special cases.
WCWedin