How can I load an HTML string into Webkit.net so I can access its "DOM" | ansaurus

tags:

views:

31

answers:

1

Q:

How can I load an HTML string into Webkit.net so I can access its "DOM"

I'd like to use Webkit.net to load an (X)HTML string and then analyze the DOM in order to "compress" it (remove whitespace, newlines, convert <input></input> and <input /> to <input> (basically an XHTML to HTML conversion, doctype allowing).

Is there anyway to do get the "DOM tree" in webkit.net? If not, are there any .net HTML parsers out there that can do this? If not, is there a .net component that already does what I'm asking?

Some Pseudo-code explaining what I'd like to do:

var DOM = Webkit.DOM.FromString("<!DOCTYPE HTML><html><head><title> Hello</title></head><body><INPUT Value="Click here"  type="submit" /><br /><span class='bold red'>An element!</span><script type='text-javascript'>/*do stuff*/</script>  <script>/*do more stuff*/</script></body></html>");

var sb = new StringBuilder();

// this would recursively iterate over all childnodes in a real scenario.
foreach(var node in DOM.Nodes){
    sb.Append(/* Compress & sort attributes, normalize & strip unneeded quotes, remove unneeded end & self-closing tags, etc. */);
}

// return optimally compressed output...
// something like:
// <!doctype html><title>Hello</title><input type=submit value="Click here"><br><span class="bold red">An element!</span><script>/*do stuff*/</script><script>/*do more stuff*/</script>
return sb.ToString();

+1 A:

Haven't used Webkit.Net but I have used HTMLAgilityPack to do a similar task to the one you have in mind and it works very well. So I think you answered your own question.

Steve 2010-10-05 23:21:07

related questions

Displaying Flash content in a C# WinForms application

How to get the value of built, encoded ViewState?

Unhandled Exception Handler in .NET 1.1

How do I connect to a database and loop over a recordset in C#?

How do I most elegantly express left join with aggregate SQL as LINQ query

Get a new object instance from a Type in C#

.NET Testing Framework Advice

Automatically update version number

What is the difference between an int and an Integer in Java/C#?

How to write to Web.Config in Medium Trust ?

WinForms ComboBox data binding gotcha

How do you sort a C# dictionary by value?

Adding Scripting functionality to .NET applications

Floating Point Number parsing: Is there a Catch All algorithm?

How do I print an HTML document from a web service?

Decoding T-SQL CAST in C#/VB.net

Anatomy of a "Memory Leak"

How do I get a distinct, ordered list of names from a DataTable using Linq

Reliable Timer in a Console Application

How do I fill a DataSet or a DataTable from a LINQ query resultset ?

What's the difference between Math.Floor() and Math.Truncate() in .NET?

How do I calculate relative time?

How do I calculate someone's age in C#?

Are there any conversion tools for porting Visual J# code to C#?

When setting a form's opacity should I use a decimal or double?