ansaurus

Question

REGEX - Find td with specific class, including nested tables

Answer 1

+1 A:

Why don't you use css selectors?

rahul 2009-07-09 13:00:33

It is on a .NET win app, that parses text.

Gidon 2009-07-09 13:06:39

@Gidon: Don't think about HTML as text.

Welbog 2009-07-09 13:09:59

Answer 2

A:

([tT][dD]\sclass=\"blabla\")

Ratnesh Maurya 2009-07-09 13:01:38

Answer 3

A:

You would be looking for a regex similar to /<td\sclass=\"(.*?)\">/, but I do not know the way to do this in .net.

However, due to the way you can badly form HTML, regex is not a good candidate for parsing. There are much better tools for doing that.

As has been mentioned, Using XPath would be quite a good way to do this using //td[@class="someClass"]. This would give you the td node. You can then get the contents of that and process it as required

Xetius 2009-07-09 13:02:34

Answer 4

+6 A:

Don't try to parse HTML with regular expressions. You can't write an expression that will match what you want, because HTML isn't regular.

Use an HTML/XML parser in a library your language provides. System.Xml has a number of useful classes that will let you open your file and query it with XPath.

The XPath expression you're looking for is

//td[@class="someClass"]

Welbog 2009-07-09 13:03:37

Not sure of the .net implementation, but wouldn't that be //td[@class="someClass"]

Xetius 2009-07-09 13:05:54

@Xetius: Right. Sorry. :)

Welbog 2009-07-09 13:09:06

That is what we did in the end.

Gidon 2009-07-11 16:11:13

Answer 5

+4 A:

If you need to do extenisve html parsing I would recommend using the Html Agility Pack instead of regular expressions. HAP builds an xml document from an html page so you can look for specific nodes using XPath.

René 2009-07-09 13:09:14

Answer 6

A:

You can't do this merely using regular expressions because it's too complicated. Even using lookahead matching, the regex would have to dynamically change because you'd have to increment the number of </td> you're looking for based on how many <td> are found after the one you want.

Mike Caron 2009-07-09 13:13:56

ansaurus

tags:

views:

answers:

REGEX - Find td with specific class, including nested tables

related questions