views:

497

answers:

6

I'm using Asp.net., Assuming I'm allowing user to post messages in my site with HTML tags. How do I ensure he has properly closed all the tags? Is there any HTML-tag-checker available that tries to parse tags and report errors if any? May be just like the BLOGGER has.

+2  A: 

You could use HTMLTidy to make sure the HTML is well formed. Not sure if it will report errors without fixing them but it's open source so you could hack it to do that.

MrTelly
Not sure about ASP.NET support but it's integrated well with PHP and it is superb there.
cletus
You will still need to parse the results of Tidy to make sure only whitelisted elements/attributes are present. But at least you can use Tidy to make it well-formed XHTML, so you can use a simple XML parser for the task.
bobince
+1  A: 

You could easily parse the text yourself. Define a list of tags that are allowed that require closing (strong, em, etc.). Parse the code and take each HTML tag as a token and push it on to a stack. When a closing tag is found, peek at the top item and if it is not the complement to the found closing tag, the HTML is improperly nested.

Assuming paired tags/closing tags are removed from the stack, the residual elements are the tags which are started but not completed. This is only a rudimentary approach, but it may only be a few lines of code to identify improperly nested tags or unclosed tags.

Tony k
Looks like a way. Let me try. Thanks!
Quicker to define the list of tags which DON'T require closing.
AmbroseChapel
Ambrose: Correct; I was assuming OP was only intending to allow a limited amount of HTML.
Tony k
A: 

I like to use the online html validator by the wc3 at http://validator.w3.org/ ,But remember to make sure your whole doc is valid outside of the comments first otherwise it could be quite an interesting trip.

Shard
This looks nice, but I'm just thinking how will I send a particular text content here, this is expecting an URL. May be I should dynamically render my conentent into a HTML page and send it to this validator?
A: 

I think you could try one of the WYSIWIG editors ... (good ones include http://www.fckeditor.net/, http://tinymce.moxiecode.com/, http://freetextbox.com/) you should be able to force it into "source mode", and they will probably tidy up bad HTML for you (although I haven't actually tied this technique myself :D)

DrG
May be this is what I wanted? Thanks for saving me from reinventing wheels. I'm crazy.
and not even a +1
DrG
A: 

Just an off topic idea - use a Wiki-style engine. That way you can format the HTML yourself the way you want it.

Daniel A. White
A: 

The webdeveloper toolbar addon for firefox has a function that allows you to validate the html of the page you are currently looking at using the w3c validator - tools->Validate Local HTML.

It's probably using functionality available in the validator anyway. I think it may create a temporary html file and upload it.

Luuk Paulussen