tags:

views:

249

answers:

3

I'm trying to read xml file, ex :

<entry>
    <title>FEED TITLE</title>
    <id>5467sdad98787ad3149878sasda</id>
    <tempi type="application/xml">
      <conento xmlns="http://mydomainname.com/xsd/radiofeed.xsd" madeIn="USA" />
    </tempi>
</entry>

Here is the code I have so far :

Here is my attempt of trying to code this, what to say not successful thats why I started bounty. Here it is http://pastebin.com/huKP4KED .

Bounty update :

I really really tried to do this for days now didn't expect to be so hard, I'll accept useful links/books/tutorials but prefer code because I need this done yesterday.

Here is what I need:

Concerning xml above :

  • I need to get value of title, id
  • attribute value of tempi as well as madeIn attribute value of contento

What is the best way to do this ?

EDIT:

@Pascal Thivent

Maybe creating method would be good idea like public String getValue(String xml, Element elementname), where you specify tag name, the method returns tag value or tag attribute(maybe give it name as additional method argument) if the value is not available

What I really want to get certain tag value or attribute if tag value(s) is not available, so I'm in the process of thinking what is the best way to do so since I've never done it before

+2  A: 

Use Element.getAttribute and Element.setAttribute

In your example, ((Node) content.item(0)).getFirstChild().getAttributes(). Assuming that content is a typo, and you mean contento, getFirstChild is correctly returning NULL as contento has no children. Try: ((Node) contento.item(0)).getAttributes() instead.

Another issue is that by using getFirstChild and getChildNodes()[0] without checking the return value, you are running the risk of picking up child text nodes, instead of the element you want.

Paul Butcher
@Paul Butcher thank you for your answer, `((Node) contento.item(0)).getAttributes()` returns null,
c0mrade
In which case, you are almost certainly picking up the child text node. Try `((Node) contento.item(1)).getAttributes()` instead, and if that is successful, then refactor it to make your intent clearer (probably using getNodeType).
Paul Butcher
+2  A: 

As pointed out, <contento> doesn't have any child so instead of:

(contento.item(0)).getFirstChild().getAttributes()

You should treat the Node as Element and use getAttribute(String), something like this:

((Element)contento.item(0)).getAttribute("madeIn")

Here is a modified version of your code (it's not the most robust code I've written):

InputStream inputStream = new ByteArrayInputStream(xml.getBytes());
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(inputStream);
doc.getDocumentElement().normalize();
System.out.println("Root element " + doc.getDocumentElement().getNodeName());
NodeList nodeLst = doc.getElementsByTagName("entry");
System.out.println("Information of all entries");

for (int s = 0; s < nodeLst.getLength(); s++) {

    Node fstNode = nodeLst.item(s);

    if (fstNode.getNodeType() == Node.ELEMENT_NODE) {

        Element fstElmnt = (Element) fstNode;

        NodeList title = fstElmnt.getElementsByTagName("title").item(0).getChildNodes();
        System.out.println("Title : " + (title.item(0)).getNodeValue());

        NodeList id = fstElmnt.getElementsByTagName("id").item(0).getChildNodes();
        System.out.println("Id: " + (id.item(0)).getNodeValue());

        Node tempiNode = fstElmnt.getElementsByTagName("tempi").item(0);
        System.out.println("Type : " + ((Element) tempiNode).getAttribute("type"));

        Node contento = tempiNode.getChildNodes().item(0);
        System.out.println("Made in : " + ((Element) contento).getAttribute("madeIn"));
    }
}

Running it on your XML snippet produces the following output:

Root element entry
Information of all entries
Title : FEED TITLE
Id: 5467sdad98787ad3149878sasda
Type : application/xml
Made in : USA

By the way, did you consider using something like Rome instead?

Pascal Thivent
@Pascal Thivent I did look at the rome, I get xml as string in my case from other method. Would it be better to do this with Rome? is is possible to create something more generic than this? I mean is this the way its done generally? +1
c0mrade
Why would you do `getAttributes().item(0)`? This will result in the wrong attribute if the xml element ever had the placement of its attributes swapped. `node.getAttribute('madeIn')` seems to work fine...
seanmonstar
@seanmonstar it does work fine and yes that is what I meant by saying "more generic"
c0mrade
@c0mrade: The `String` is not a problem, Rome can take an `InputStream`. I wanted at least to mention it because I consider Rome as the de facto standard to parse/aggregate/generate feeds and it provides a nice and clean API. Depending on what you want to do exactly, Rome might be indeed the standard choice.
Pascal Thivent
Also, remember that the order of attributes isn't considered significant in XML - a processor could validly return the list of attributes in any order.
Adrian Mouat
@seanmonstar @Adrian You are both right. But the requirements are not crystal clear and my first intention was to point out the problem in the actual code. But I'll fix that to avoid more confusion.
Pascal Thivent
@Pascal Thivent `the requirements are not crystal clear` Take a look at my edit pls
c0mrade
@Pascal Thivent I've tried Rome, its actually quite nice, I managed to loop trough entries, and to get title and id, how do I get custom named tags?
c0mrade
@c0mrade Are you parsing a real rss or atom feed?
Pascal Thivent
@Pascal Thivent I'm parsing atom feed
c0mrade
+3  A: 

The best solution for this is to use XPath. Your pastebin is expired, but here's what I gathered. Let's say we have the following feed.xml file:

<?xml version="1.0" encoding="UTF-8" ?>
<entries>
<entry>
    <title>FEED TITLE 1</title>
    <id>id1</id>
    <tempi type="type1">
      <conento xmlns="dontcare?" madeIn="MadeIn1" />
    </tempi>
</entry>
<entry>
    <title>FEED TITLE 2</title>
    <id>id2</id>
    <tempi type="type2">
      <conento xmlns="dontcare?" madeIn="MadeIn2" />
    </tempi>
</entry>
<entry>
    <id>id3</id>
</entry>
</entries>

Here's a short but compile-and-runnable proof-of-concept (with feed.xml file in the same directory).

import javax.xml.xpath.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import java.io.*;
import java.util.*;

public class XPathTest {
    static class Entry {
        final String title, id, origin, type;
        Entry(String title, String id, String origin, String type) {
            this.title = title;
            this.id = id;
            this.origin = origin;
            this.type = type;
        }
        @Override public String toString() {
            return String.format("%s:%s(%s)[%s]", id, title, origin, type);
        }
    }

    final static XPath xpath = XPathFactory.newInstance().newXPath();
    static String evalString(Node context, String path) throws XPathExpressionException {
        return (String) xpath.evaluate(path, context, XPathConstants.STRING);
    }

    public static void main(String[] args) throws Exception {
        File file = new File("feed.xml");
        Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(file);
        NodeList entriesNodeList = (NodeList) xpath.evaluate("//entry", document, XPathConstants.NODESET);

        List<Entry> entries = new ArrayList<Entry>();
        for (int i = 0; i < entriesNodeList.getLength(); i++) {
            Node entryNode = entriesNodeList.item(i);
            entries.add(new Entry(
                evalString(entryNode, "title"),
                evalString(entryNode, "id"),
                evalString(entryNode, "tempi/conento/@madeIn"),
                evalString(entryNode, "tempi/@type")
            ));
        }
        for (Entry entry : entries) {
            System.out.println(entry);
        }
    }
}

This produces the following output:

id1:FEED TITLE 1(MadeIn1)[type1]
id2:FEED TITLE 2(MadeIn2)[type2]
id3:()[]

Note how using XPath makes the value retrieval very simple, intuitive, readable, and straightforward, and "missing" values are also gracefully handled.

API links

polygenelubricants
@polygenelubricants I saw that approach here http://java.sun.com/developer/technicalArticles/xml/mapping/ but wasn't really sure how to implement it. So suggest that I create Entry class with fields title, id, tempi ? I'm not following you how would I use that instead of map<string,string> and how would I get tags attributes with this approach ?
c0mrade
@c0mrade: The issue is how to _represent_ the data, not how to _get _them. You can still get the values using XPath as I've shown above, but instead of storing them into a `Map<String,String>`, you'd want to use a `class Entry` that has `getId()`, `getTitle()`, `getCountryOfOrigin()`, etc. This is your own `class Entry`, so you can define what the fields should be called and what they mean.
polygenelubricants
@c0mrade: I've modified the sample to use `class Entry` instead of `Map<String,String>`. The main point is the use of XPath, though.
polygenelubricants
@polygenelubricants excellent thank you this is what I've been after, I've created seperate Entry class and created method inside printFeed() and just pasted return of your toString method. This is great thank you, should I wait for more time or accept the bounty now?
c0mrade
@c0mrade: when to accept is entirely up to you. It's your right to accept now, later, or (gulp!) never.
polygenelubricants
@polygenelubricants I thought of leaving it so you can collect many more vote up, cause this really looks simple and just answers the question 100 % accurate but I accepted so I won't forget later. tnx
c0mrade