Are there any production-ready libraries for streaming XPath expressions evaluation against provided xml-document? My investigations show that most of existing solutions load entire DOM-tree into memory before evaluating xpath expression.
Would this be practical for a complete XPath implementation, given that XPath syntax allows for:
/AAA/XXX/following::*
and
/AAA/BBB/following-sibling::*
which implies look-ahead requirements ? i.e. from a particular node you're going to have to load the rest of the document anyway.
The doc for the Nux library (specifically StreamingPathFilter) makes this point, and references some implementations that rely on a subset of XPath. Nux claims to perform some streaming query capability, but given the above there will be some limitations in terms of XPath implementation.
There are several options:
DataDirect Technologies sells an XQuery implementation that employs projection and streaming, where possible. It can handle files into the multi-gigabyte range - e.g. larger than available memory. It's a thread-safe library, so it's easy to integrate. Java-only.
Saxon is an open-source version, with a modestly-priced more expensive cousin, which will do streaming in some contexts. Java, but with a .net port also.
MarkLogic and eXist are XML databases that, if your XML is loaded into them, will process XPaths in a fairly intelligent fashion.
FWIW, I've used Nux streaming filter xpath queries against very large (>3GB) files, and it's both worked flawlessly and used very little memory. My use case is been slightly different (not validation centric), but I'd highly encourage you to give it a shot with Nux.