views:

63

answers:

2

Hi everyone,

I'm trying to choose between a couple of different HTML parsers for a project I am working on, part of which accepts HTML input from the client.

I've built a simple automated test for each one, to see if they fit my needs. I have a large number of real-life HTML fragments to test, but they aren't enough for testing for safety, since they (probably) do not contain any malicious code.
I don't mind reviewing the outputs by hand.

My question is, is there a freely available database or list of HTML snippets containing malformed HTML and scripts intended for testing for XSS?

A: 

Google's home page seems to be malformed, maybe you can use that? http://validator.w3.org/check?uri=www.google.com&charset=%28detect+automatically%29&doctype=Inline&group=0

http://www.codinghorror.com/blog/2006/11/its-a-malformed-world.html

Tommy
Using the Google home page won't suffice. I have plenty of real world cases I can test, but I'm looking for HTML snippets which are really malformed or contain malicious code on purpose, to see if the parsers omit them correctly.
GeReV
+1  A: 

The ha.ckers XSS cheatsheet is pretty comprehensive, and was the catalyst for me to build a whitelist based sanitiser into jsoup.

Jonathan Hedley