views:

382

answers:

5

I have been tasked with finding an open source DOM XML parser. The parser must minimally support XPath 1.0. Schema support is desired, but not a deal breaker

The files we are parsing will be small so speed and memory consumption are not a large concern.

Any OO language (C++, C#, Java, etc.).

To clarify, the plan is to integrate an XML parser into an application much tighter than can be done with an external parser. We are creating an adaptive object model based on XML (change the XML, change the object model.) To do this we need to integrate the parser at a pretty low level. This results in a level of elegance that needs to be experienced to be understood (thank you Mr. Yoder). Part of that elegance disappears if we don't have the ability to navigate this object model via XPath.

We have created a prototype that uses an operating system provided parser. It worked pretty well, but suffers from complexity and performance problems. But hey, it was a prototype. Now I want to do the real thing and I can write the parser from scratch. (I've done that part and it was kinda easy.) Now, the XPath engine is a different story. I'm pretty sure I won't get that done in a weekend.

A: 

To answer this question well, I think you need to supply a little more context. Having said that, I am finding that the new Object Model (XElement etc) for Xml in .NET 3.5 supporting Linq to XML makes XML navigation vastly, and I really do mean an order of magnitude, easier and better than using a DOM

Tim Jarvis
I have a legacy application that we have glued code to express it's internal state as XML (looks like DOM parser). Now I want to navigate this "DOM" via XPath. Expressing the internal state as an XML DOM was easy. I discovered that it's not so easy to write an XPath parser to navigate that DOM.
Michael J
Linq has intrigued me, but I'm not able wrap my mind around what it might mean to use it to navigate and object model.
Michael J
A: 

If you're allowing C#, then wouldn't you have the C# standard libraries available? Are they deficient?

Same for java? And it all started in C++. I don't understand the lack.

Googling for "XML parser XPATH" finds lots of hits for CPAN, JDOM and J2SE, cocoa, MSXML, etc.

Are you just starting your search here, or are the standard answers insufficient?

EDIT:

Your clarifications suggest to me that you don't want to use it, you want to use the source to jump-start your own XPATH module in your own XML parser? Is that correct? And you don't care about the language because all you want is the design, not the code?

le dorfier
You are correct. I have been searching for a suitable example XPath implementation for some time. I'll write it myself if need be, but a good example never hurts. A coworker suggested that I post here.
Michael J
+1  A: 

The ever-excellent Jaxen may be useful to you here. It's a Java XPath implementation used for both JDom and Dom4J.

In refactoring out the common functionality to traverse the two DOM implementations, you now have an XPath engine which can query any tree-shaped model. You only have to write what they call a Navigator, which is comparatively simple to write.

From the FAQ:

How do I support a different object model?

The only thing required is an implementation of the interface org.jaxen.Navigator. Not all of the interface is required, and a default implementation, in the form of org.jaxen.DefaultNavigator is also provided.

Since many of the XPath axes can be defined in terms of each other (for example, the ancestor axis is merely a the parent recursively applied), only a few low-level axis iterators are required to initially get started. Of course, you may implement them directly, instead of relying upon jaxen's composition ability.

I've found writing these relatively quick.

jamesh
Thank you very much, this looks very interesting.
Michael J
Jamesh, I have only had a few hours to examine Jaxen, but I wanted to let you know that it looks very promising. Thank you very much for the suggestion.
Michael J
A: 

If all you want is the design logic to base stuff on, and not the code, you could research Ruby's REXML library. It's OO and quite good and has full XPath support.

MRI has an implementation in C and Ruby. JRuby has an implementation in Java.

singpolyma
A: 

Probably a long shot, but jQuery apparently supports XPath syntax to reference DOMs; and I think its source code is accessible.

le dorfier