views:

1004

answers:

7

I am creating a tool that will check dynamically generated XHTML and validate it against expected contents.

I need to confirm the structure is correct and that specific attributes exist/match. There may be other attributes which I'm not interested in, so a direct string comparison is not suitable.

One way of validating this is with XPath, and I have implemented this already, but I would also like something less verbose - I want to be able to use CSS Selectors, like I can with jQuery, but on the server - within CFML code - as opposed to on the client.

Is there a CFML or Java library that allows me to use CSS Selectors against an XHTML string?

+2  A: 

I don't know of a Java library itself, but there is a Ruby library called Hpricot that does exactly what you're looking for. In conjunction with the Ruby implementation on the Java platform, JRuby, it should be relatively straightforward to call Ruby methods from your Java code (using BSF, JSR-222 Scripting APIs, or an internal API).

Are you using Coldfusion 8? Coldfusion 8, being based on Java 6, supports JSR-222 Scripting APIs "javax.scripting".

Take a look at this blog entry on embedding PHP within CFML. You should be able to do the same with Ruby. There is ZIP file example code linked from this blog posting, and if you crack open the CFML, you'll see a good example of embedding Ruby within CFML.

Although it might take a bit of work to make all the pieces work together, but with a bit of investment, it should give you the robust parsing/CSS selector querying that you're looking for.

myabc
A: 

There is a theoretical difference between the server and client. To a web browser, the document is a living DOM hierarchy. To your server code it's merely an XML document of whatever type. XPath is the "correct" way to access elements of an XML document.

So unless you have a serious performance problem with your current XPath solution, or it doesn't actually work correctly, I suggest you stick with it. Trying something too clever brings the risk of breaking something that's working.

If you find the XPath to be too verbose and ugly to leave sitting around, or want more power to re-use the tool in different cases, or just can't resist trying to do something clever, then you could try writing a utility that compiles a given CSS selector into an XPath. You could then call this in one line whenever you needed.

Marcus Downing
xml != html. you cant reliably parse html with xpath
Shawn Simon
XHTML, if it's valid, is a subset of XML. The question clearly said it was XHTML they were generating.
Marcus Downing
+2  A: 

Hpricot is definetly a fantastic solution if the JRuby-route is open to you.

Wrt. XPath being the "correct" way to access XML documents... sorry but this is rubbish. There are numerous ways to access elements of an XML document: DOM traversal, XPath, XQuery, CSS selectors to name a few. XPath is certainly popular but CSS selectors are very very powerful, assuming your XML document has HTML semantics.

Ijonas
trouble is that hpricot is based on a native parser, not sure how easy it'd be to get it to run in jRuby.
Gareth Davis
Hpricot runs just fine inside JRuby as the authors have build a Java-implementation of the native parser.
Ijonas
A: 

Load the document into the DOM in PHP. You should have some interesting and very concise query possibilities.

mvrak
+2  A: 

Hi!

I've just released an open source project which is a W3C CSS Selectors Level 3 implementation in Java. Please give it a try. I was looking for the same thing and decided to implement my own engine. It's inspired by the code in WebKit etc.

http://github.com/chrsan/css-selectors/tree

/Christer

Thanks Christer. I haven't yet had a chance to get back to the project I needed this for, but it does seem exactly what I want - I'll take a look at some point this week, and give you any feedback I might have.
Peter Boughton
+2  A: 

If you can use PHP within your CFML (as mentioned above), you could take advantage of this excellent "jQuery for PHP" library, phpQuery

Full CSS selector support, manipulation functions, traversing, etc. It should work great for what you need.

Hope it helps.

A: 

Hi, it may be easier to use cQuery.com - cQuery.com is an API based 'Content Query Engine' to extract content from live websites by using CSS.

You can using it programatically in you application.

Tom Carnell