views:

29

answers:

1

I'd like to parse some well-formed XML into a DOM, but I'd like know the offset of each node's tag in the original media.

For example, if I had an XML document with the content something like:

<html>
<body>
<div>text</div>
</body>
</html>

I'd like to know that the node starts at offset 13 in the original media, and (more importantly) that "text" starts at offset 18.

Is this possible with standard Java XML parsers? JAXB? If no solution is easily available, what type of changes are necessary along the parsing path to make this possible?

+1  A: 

The SAX API provides a rather obscure mechanism for this - the org.xml.sax.Locator interface. When you use the SAX API, you subclass DefaultHandler and pass that to the SAX parse methods, and the SAX parser implementation is supposed to inject a Locator into your DefaultHandler via setDocumentLocator(). As the parsing proceeds, the various callback methods on your ContentHandler are invoked (e.g. startElement()), at which point you can consult the Locator to find out the parsing position (via getColumnNumber() and getLineNumber())

Technically, this is optional functionality, but the javadoc says that implementations are "strongly encouraged" to provide it, so you can likely assume the SAX parser built into JavaSE will do it.

Of course, this does mean using the SAX API, which is noone's idea of fun, but I can't see a way of accessing this information using a higher-level API.

edit: Found this example.

skaffman