views:

214

answers:

4

Hello, I need to write a java application that does a keyword search within the tags and the actual data from many xml files. From my research online I get the feeling i have to use xalan, but I can't figure out how to use it or what it does. Could somebody point me in the right direction? Thanks

A: 

I sounds like you are looking for an XPath implementation for Java. This allows you to construct a search expression and apply it to one or more XML documents (which generally have to have been parsed). Xalan is one option, but there are others. Versions of Java starting with Java 5 have included XML parsing and XPath capabilities. If you are using a recent version of Java, and want to simply parse and search through a set of XML documents, then you likely need nothing besides the Java SDK.

See this article for a good (but somewhat dated) overview of the XPath capabilities that come "out of the box": http://www.ibm.com/developerworks/library/x-javaxpathapi.html

KeithL
A: 

See this SO post on how to do a search using the contains() XPath function.

As for an example on how to do an XPath query, I suggest looking at the Java XPath documentation. Here's the example code they provide:

XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/widgets/widget";
InputSource inputSource = new InputSource("widgets.xml");
NodeSet nodes = (NodeSet) xpath.evaluate(expression, inputSource, XPathConstants.NODESET);

This would load the file widgets.xml and return a NodeSet of all nodes matching the expression.

Kaleb Brasee
+2  A: 

The first thing you need to do is to decide what data you're actually going to search. You say "within the tags and actual data" -- does that mean that you'll do a keyword search for an element name? Or an element name and content within it?

Depending on how complex your search queries are, you'll probably want to turn to a real search engine, like Lucene. I will say, however, that before you take this step you need to give a lot of thought to how you plan to search, so that you build an appropriate index.

If your search requirements are simpler, you could load the documents into a DOM and use XPath. I'd suggest trying this out before moving to Lucene.

You don't need Xalan; the JDK comes with XML parsers and an XPath evaluator. I've written a couple of articles on using them: (parsing), (xpath).

kdgregory
+2  A: 

Xalan is an XSLT processor: it enables you to write an XSL stylesheet that will transform your source XML document into something else.

Sure may write an XSL transform and then you search the result of the transform.

Another option is to parse the document with an XML parser and then use Lucene: see Parsing, indexing, and searching XML documents with Digester and Lucene.

You may also want to use XPath. It all depends on what exactly you want to achieve.

Gregory Pakosz