views:

31

answers:

1

I'd like to use Webkit.net to load an (X)HTML string and then analyze the DOM in order to "compress" it (remove whitespace, newlines, convert <input></input> and <input /> to <input> (basically an XHTML to HTML conversion, doctype allowing).

Is there anyway to do get the "DOM tree" in webkit.net? If not, are there any .net HTML parsers out there that can do this? If not, is there a .net component that already does what I'm asking?

Some Pseudo-code explaining what I'd like to do:

var DOM = Webkit.DOM.FromString("<!DOCTYPE HTML><html><head><title> Hello</title></head><body><INPUT Value="Click here"  type="submit" /><br /><span class='bold red'>An element!</span><script type='text-javascript'>/*do stuff*/</script>  <script>/*do more stuff*/</script></body></html>");

var sb = new StringBuilder();

// this would recursively iterate over all childnodes in a real scenario.
foreach(var node in DOM.Nodes){
    sb.Append(/* Compress & sort attributes, normalize & strip unneeded quotes, remove unneeded end & self-closing tags, etc. */);
}

// return optimally compressed output...
// something like:
// <!doctype html><title>Hello</title><input type=submit value="Click here"><br><span class="bold red">An element!</span><script>/*do stuff*/</script><script>/*do more stuff*/</script>
return sb.ToString();
+1  A: 

Haven't used Webkit.Net but I have used HTMLAgilityPack to do a similar task to the one you have in mind and it works very well. So I think you answered your own question.

Steve