views:

1101

answers:

5

Hi, I'm looking to parse html using .net for the purposes of testing or asserting its content. i.e.

HtmlDocument doc = GetDocument("some html") List forms = doc.Forms() Link link = doc.GetLinkByText("New Customer")

the idea is to allow people to write tests in c# similar to how they do in webrat (ruby).

i.e.

visits('\') fills_in "Name", "mick" clicks "save"

I've seen the html agility pack, sgmlreader etc but has anyone created an object model for this, i.e. a set of classes representing the html elements, such as form, button etc??

Cheers.

A: 

The closest thing to an HTML DOM in .NET, as far as I can tell, is the HTML DOM.

You can use the Windows Forms WebBrowser control, load it with your HTML, then access the DOM from the outside.

BTW, this is .NET. Any code that works for VB.NET would work for C#.

John Saunders
i'd rather not start hosting UI controls for this, then i'll get in to the usual threading issues with UI control, plus performance will suffer, i'm using this for testing asp.net mvc pages and am avoiding selenium etc because of the browser overhead. what would be ideal would be something like HtmlUnit (java based). not sure if i'd have the time to port it as its a monster, it also supports javascript but i dont need it to test my apps (i.e. unobtrusive).
mickdelaney
From HmlUnit: final WebClient webClient = new WebClient();final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");final HtmlDivision div = page.getHtmlElementById("some_div_id");final HtmlAnchor anchor = page.getAnchorByName("anchor_name");http://htmlunit.sourceforge.net/
mickdelaney
no formatting in comments then?
mickdelaney
Not much formatting in comments. Surround with _underscores_ or single *asterisks* or **double** asterisks or `backQuotes<T>` or maybe ***triple*** asterisks. But it's limited and meant to be that way.
John Saunders
Ok, triple didn't work.
John Saunders
Benefit of WebBrowser control - it's IE. It will behave like IE does. This would be important for AJAX scenarios or any other situation where some of the HTML is produced on the fly. You can actually find elements and invoke their `click` methods, to fire the JavaScript that would run if in a normal browser.
John Saunders
A: 

you have 2 major options:

  1. Use some browser engine (i.e. internet explorer) that will parse the html for u and then will give give u access to the generated DOM. this option will require u to hvae some interop with the browser engine (in the case of i.e. it's simple COM)

  2. use some light weight parser like HtmlAgilityPack

yosig81
-1: 1. That's what I answered 15 minutes earlier. 2. Read the question. He knows about the HtmlAgilityPack and doesn't want it.
John Saunders
that's correct. missed his last section.
yosig81
+1  A: 

Here is good library for html parsing, objects like HtmlButton , HtmlInput s are not created but it is a good point to start and to create them yourself if you don't want to use HTML DOM

ArsenMkrt
A: 

It sounds to me like you are trying to do HTML unit tests. Have you looked into Selenium? It even has C# library so that you can write your HTML unit tests in C# and assert that elements exist and that they have the correct values and even click on links. It even works with JavaScript / AJAX sites.

Eric J. Smith
its too slow for what i want. basically in rails i use webrat for the majority of my acceptance testing, its an inmemory browser (basically a html parser), because of that its very fast, then i may use watir/selenium etc for a smoke test but its v slow so i dont want to use it for everything.
mickdelaney
A: 

The best parser for HTML is the HTQL COM. Use can use HTQL queries to retrieve HTML content.

seagulf