views:

50

answers:

3

Can I parse the html tables by giving only column name ?

Like only those data should be extracted from the table which matches those column names I give.

Like for example I have table of column names like serial no., name, address, phone no,total Rs..

And I want to extract the information about only name, phone no and total Rs.. Then how can I do it?

+2  A: 

Yes you can. You can use XPATH to scan your html document (google for screen scraping). Another technique is UI testing frameworks like Watin which let you use CSS selectors and more to find elements on a HTML page and get the contents.

DarkwingDuck
+1 good answer :)
Saar
@DarkwingDuck XPATH ? Is it the class or anything else which is inbuilt in .net ?
Harikrishna
@DarkWingDuck Through that only those data will be extracted for which I will give input of column names ?.
Harikrishna
Yes XPath is an XML function which uses a particular syntax for finding elements in XML documents. XPath is supported fully in .Net. And sorry, I didn't understand your second question.
DarkwingDuck
@DarkWingDuck I wanted to say that through the XPATH it is possible that only that data will be retrieved which is of the column name I want As I explained in my main question above with example.
Harikrishna
+1  A: 

You can use Data Extracting SDK which has HtmlProcessor class with Tables property which handles HTML tables as DataTable objects.

sashaeve
+3  A: 

Take a look at Html Agility Pack It provides an LINQ api for searching html content.

Mike Two
+1 for Html Agility Pack
šljaker