ansaurus

Question

How can I parse a namespace using the SAX parser?

Answer 1

+1 A:

An element like <google:image_link> has the local name image_link belonging to the google namespace. You need to ensure that the XML parsing framework is aware of namespaces, and you'd then need to find this element using the appropriate namespace.

For example, a few SAX1 interfaces in package org.xml.sax has been deprecated, replaced by SAX2 counterparts that include namespace support (e.g. SAX1 Parser is deprecated and replaced by SAX2 XMLReader). Consult the documentation on how to specify the namespace uri or qualified (prefixed) qName.

I thought of using XPath. I've used it in Flash a lot but I'm not sure we will be using XPath with this project. The application I'm building is for Android so I'm concerned it might add extra overhead but I'll look into it anyway. Would prefer to use the SAX parser unless I have no other option. In any case, thank you!

Silvestri 2010-08-14 17:32:56

@Silvestri: Here's a quick and dirty code I wrote based on SAX2, qualified names seems to work fine for me. http://ideone.com/caoh7

polygenelubricants 2010-08-14 18:25:00

Will be checking this out tonight...

Silvestri 2010-08-14 21:15:43

Answer 2

+1 A:

From sample it is not actually clear what namespace that 'google' prefix binds to -- previous answer is slightly incorrect in that it is NOT in "google" namespace; rather, it is a namespace that prefix "google" binds to. As such you have to match the namespace (identified by URI), and not prefix. SAX does have confusing way of reporting local name / namespace-prefix combinations, and it depends on whether namespace processing is even enabled.

You could also consider alternative XML processing libraries / APIs; while SAX implementations are performant, there are as fast and more convenient alternatives. Stax (javax.xml.stream.*) implementations like Woodstox (and even default one that JDK 1.6 comes with) are fast and bit more convenient. And StaxMate library that builds on top of Stax is much simpler to use for both reading and writing, and speedwise as fast as SAX implementations like Xerces. Plus Stax API has less baggage wrt namespace handling so it is easier to see what is the actual namespace of elements.

StaxMan 2010-08-15 03:56:09

+1, thanks for the correction. I give you permission to edit my answer to correct any mistakes, or just extract parts into your own with correction etc.

polygenelubricants 2010-08-15 09:32:13

Thanks. I don't seem to have access to edit, but I think it's fine to just change wording to mention indirection?

StaxMan 2010-08-16 19:13:20

Answer 3

A:

Like user polygenelubricants said: generally the parser needs to be namespace aware to handle elements which belong to some prefixed namespace. (Like that <google:image_link> element.)

This needs to be set as a "parser feature" which AFAIK can be done in few different ways: The XMLReader interface itself has method setFeature() that can be used to set features for a certain parser but you can also use same method for SAXParserFactory class so that this factory generates parsers with those features already on or off. SAX2 standard feature flags should be on SAXproject's website but at least some of them are also listed in Java API documentation of package org.xml.sax.

For simple documents you can try to take a shortcut. If you don't actually care about namespaces and element names as in a URL + local-name combination, and you can trust that the elements you are looking for (and only these) always have certain prefix and that there aren't elements from other namespaces with same local name then you might just solve your problem by using qname parameter of startElement() method instead of localName or vice versa or by adding/dropping the prefix from the tag name string you compare to.

The contents of parameters namespaceUri, qname or localName is according to Java specs actually optional and AFAIK they might be null for this reason. Which ones of them are null depends on what are those aforementioned "parser features" that affect namespaces. I don't know can the parameter that is null vary between elements in a namespace and elements without a namespace - I haven't investigated that behaviour.

PS. XML is case sensitive. So ideally you don't need to ignore case in tag name string comparison.
-First post, yay!

jasso 2010-08-15 09:41:27

ansaurus

tags:

views:

answers:

How can I parse a namespace using the SAX parser?

See also

related questions