views:

137

answers:

3

Hello,

Using a twitter search URL ie. http://search.twitter.com/search.rss?q=android returns CSS that has an item that looks like:

<item>
      <title>@UberTwiter still waiting for @ubertwitter  android app!!!</title>
      <link>http://twitter.com/meals69/statuses/21158076391&lt;/link&gt;
      <description>still waiting for an app!!!</description>
      <pubDate>Sat, 14 Aug 2010 15:33:44 +0000</pubDate>
      <guid>http://twitter.com/meals69/statuses/21158076391&lt;/guid&gt;
      <author>Some Twitter User</author>
      <media:content type="image/jpg" height="48" width="48" url="http://a1.twimg.com/profile_images/756343289/me2_normal.jpg"/&gt;
      <google:image_link>http://a1.twimg.com/profile_images/756343289/me2_normal.jpg&lt;/google:image_link&gt;
      <twitter:metadata>
        <twitter:result_type>recent</twitter:result_type>
</twitter:metadata>
</item>

Pretty simple. My code parses out everything (title, link, description, pubDate, etc.) without any problems. However, I'm getting null on:

<google:image_link>

I'm using Java to parse the RSS feed. Do I have to handle compound localnames differently than I would a more simple localname?

This is the bit of code that parses out Link, Description, pubDate, etc:

@Override
    public void endElement(String uri, String localName, String name)
            throws SAXException {
        super.endElement(uri, localName, name);
        if (this.currentMessage != null){
            if (localName.equalsIgnoreCase(TITLE)){
                currentMessage.setTitle(builder.toString());
            } else if (localName.equalsIgnoreCase(LINK)){
                currentMessage.setLink(builder.toString());
            } else if (localName.equalsIgnoreCase(DESCRIPTION)){
                currentMessage.setDescription(builder.toString());
            } else if (localName.equalsIgnoreCase(PUB_DATE)){
                currentMessage.setDate(builder.toString());
            } else if (localName.equalsIgnoreCase(GUID)){
                currentMessage.setGuid(builder.toString());
            } else if (uri.equalsIgnoreCase(AVATAR)){
                currentMessage.setAvatar(builder.toString());
            } else if (localName.equalsIgnoreCase(ITEM)){
                messages.add(currentMessage);
            } 
            builder.setLength(0);   
        }
    }

startDocument looks like:

@Override
    public void startDocument() throws SAXException {
        super.startDocument();
        messages = new ArrayList<Message>();
        builder = new StringBuilder();

    }

startElement looks like:

@Override
    public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {
        super.startElement(uri, localName, name, attributes);
        if (localName.equalsIgnoreCase(ITEM)){
            this.currentMessage = new Message();
        } 
    }

Tony

+1  A: 

An element like <google:image_link> has the local name image_link belonging to the google namespace. You need to ensure that the XML parsing framework is aware of namespaces, and you'd then need to find this element using the appropriate namespace.

For example, a few SAX1 interfaces in package org.xml.sax has been deprecated, replaced by SAX2 counterparts that include namespace support (e.g. SAX1 Parser is deprecated and replaced by SAX2 XMLReader). Consult the documentation on how to specify the namespace uri or qualified (prefixed) qName.

See also

polygenelubricants
I'm using the SAX parsing framework (I beleive). I'm brand new to Java.
Silvestri
@Silvestri: Can you add some code snippet to show how you're doing this?
polygenelubricants
Just added some code snippets and am reading through the docs now. Still not clear on how I'm going to accomplish this but I would have imagined this to be super simple.
Silvestri
@Silvestri: I think there are many ways to do this that'd be easier, e.g. XPath, or Apache's Digester. Rest assured that I'm still working on this.
polygenelubricants
@Silvestri: check out the XPath approach: http://ideone.com/UqkQU ; check out Apache's Digester also, it'd make the code a lot simpler too. I'll get back to this tomorrow, hopefully others will also give you good answers by then.
polygenelubricants
I thought of using XPath. I've used it in Flash a lot but I'm not sure we will be using XPath with this project. The application I'm building is for Android so I'm concerned it might add extra overhead but I'll look into it anyway. Would prefer to use the SAX parser unless I have no other option. In any case, thank you!
Silvestri
@Silvestri: Here's a quick and dirty code I wrote based on SAX2, qualified names seems to work fine for me. http://ideone.com/caoh7
polygenelubricants
Will be checking this out tonight...
Silvestri
+1  A: 

From sample it is not actually clear what namespace that 'google' prefix binds to -- previous answer is slightly incorrect in that it is NOT in "google" namespace; rather, it is a namespace that prefix "google" binds to. As such you have to match the namespace (identified by URI), and not prefix. SAX does have confusing way of reporting local name / namespace-prefix combinations, and it depends on whether namespace processing is even enabled.

You could also consider alternative XML processing libraries / APIs; while SAX implementations are performant, there are as fast and more convenient alternatives. Stax (javax.xml.stream.*) implementations like Woodstox (and even default one that JDK 1.6 comes with) are fast and bit more convenient. And StaxMate library that builds on top of Stax is much simpler to use for both reading and writing, and speedwise as fast as SAX implementations like Xerces. Plus Stax API has less baggage wrt namespace handling so it is easier to see what is the actual namespace of elements.

StaxMan
+1, thanks for the correction. I give you permission to edit my answer to correct any mistakes, or just extract parts into your own with correction etc.
polygenelubricants
Thanks. I don't seem to have access to edit, but I think it's fine to just change wording to mention indirection?
StaxMan
A: 

Like user polygenelubricants said: generally the parser needs to be namespace aware to handle elements which belong to some prefixed namespace. (Like that <google:image_link> element.)

This needs to be set as a "parser feature" which AFAIK can be done in few different ways: The XMLReader interface itself has method setFeature() that can be used to set features for a certain parser but you can also use same method for SAXParserFactory class so that this factory generates parsers with those features already on or off. SAX2 standard feature flags should be on SAXproject's website but at least some of them are also listed in Java API documentation of package org.xml.sax.

For simple documents you can try to take a shortcut. If you don't actually care about namespaces and element names as in a URL + local-name combination, and you can trust that the elements you are looking for (and only these) always have certain prefix and that there aren't elements from other namespaces with same local name then you might just solve your problem by using qname parameter of startElement() method instead of localName or vice versa or by adding/dropping the prefix from the tag name string you compare to.

The contents of parameters namespaceUri, qname or localName is according to Java specs actually optional and AFAIK they might be null for this reason. Which ones of them are null depends on what are those aforementioned "parser features" that affect namespaces. I don't know can the parameter that is null vary between elements in a namespace and elements without a namespace - I haven't investigated that behaviour.

PS. XML is case sensitive. So ideally you don't need to ignore case in tag name string comparison.
-First post, yay!

jasso