views:

104

answers:

3

I've got a problem I need solved using Regex expressions; it involves taking a CSS selector and compiling a regex that matches the string representation of the nodes inside an HTML document. The point is to avoid parsing the HTML as XML and then either making Xpath or DOM queries to apply style attributes.

Does anyone know of a project that already implements something like this in any language? The target platform would be .NET 3.5.

A: 

Regular expressions seem like an amazingly bad way of matching those nodes. I'm not sure I follow your problem - why not just use something like jquery to pick out those nodes? eg given a css selector 'div>span.red:first-child',

$('div>span.red:first-child')

would return an array of those matching nodes.

EDIT: Oh, wait - are you trying to do this 'offline', as it were - not in a user's browser? Yeah, ignore my advice. (Even so, I'd still suggest that regular expressions aren't going to help you. Why are you against generating an xml-document representation of the page?)

jdelStrother
Seconding translating css selectors into regular expressions sounding like a bad idea.
Stuart Branham
Yeah, not my first choice nor decision either. But given the restrictions, I see no other more efficient way, apart from the HTML Agility Pack listed above.
Krof Drakula
A: 

Why would you want to avoid parsing the DOM of a language that is designed to be parsed as a DOM?

Amber
Because I'm looking for a fast-and-cheap way of assigning style attributes to matching nodes without parsing XML (not by choice, it's a restriction).
Krof Drakula
+3  A: 

Html Agility Pack

Jeremy Stein
Well if the performance isn't too bad, this would be ideal for the job. Thanks!
Krof Drakula