ansaurus

Question

How to get element only elements with values Stax

Answer 1

+1 A:

Try this:

while (xmlStreamReader.hasNext()) {
    int event = xmlStreamReader.next();

    if (event == XMLStreamConstants.START_ELEMENT) {
        try {
            String text = xmlStreamReader.getElementText();
            System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
            System.out.println("Text:" + text);
        } catch (XMLStreamException e) {

        }
    }

}

SAX based solution (works):

public class Test extends DefaultHandler {

    public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException, XPathExpressionException, XMLStreamException {
        SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
        parser.parse(new File("src/file.xml"), new Test());
    }

    private String currentName;

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        currentName = qName;
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        String string = new String(ch, start, length);
        if (hasText(string)) {
            System.out.println(currentName);
            System.out.println(string);
        }
    }

    private boolean hasText(String string) {
        string = string.trim();
        return string.length() > 0;
    }
}

Georgy Bolyuba 2010-07-20 20:09:53

@Georgy Bolyuba I think I already tried xmlStreamReader.getElementText(); but I didn't store it in a variable, is it possible that it caused a problem?

c0mrade 2010-07-20 20:13:46

Actually, this solution does not work 100% (just checked). It skips <StartTime>. Implementation "swallows" second START_ELEMENT, I think. The good news is that you can improve it. Check out current impl: http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html#getElementText%28%29 and make a better one :)

Georgy Bolyuba 2010-07-20 20:16:37

@Georgy Bolyuba yes I realized that just now, but still I leave you +1, hehehehe you're funny "make a better one" :D

c0mrade 2010-07-20 20:24:24

Yeah, here is a hint: use SAX

Georgy Bolyuba 2010-07-20 20:31:54

@Georgy Bolyuba I'll accept your solution I did it using stax as well, nice to learn new things.

c0mrade 2010-07-20 21:14:16

You should post your stax solution here

Georgy Bolyuba 2010-07-20 21:43:59

@Georgy Bolyuba alright

c0mrade 2010-07-20 21:53:05

@Georgy Bolyuba just want to thank you again, this works really good with outputstream better than the junk I used to have, is it possible to customize this code of yours to print something before or after top xml element(the one which comes after the root element, which is repeating throughout document).

c0mrade 2010-07-22 02:02:47

You would have to implement endElement and add the logic you want. It is pretty easy to do

Georgy Bolyuba 2010-07-22 02:20:02

@Georgy Bolyuba I was reading SAX documentation, I saw that startElement and endElement does things for every XML element, is it possible to capture when main element(as in question `<item>`) starts and write something before it or main element(not before or after every element)?

c0mrade 2010-07-22 08:21:11

There is no specific method for that. You will have to put some logic into startElement to check if this is your "main" element yourself (like, compare the name). But at this point I would switch to DOM model. If you are planning to add more logic to your code, DOM would be a better option for you.

Georgy Bolyuba 2010-07-22 09:28:21

Answer 2

A:

Stax solution :

Parse document

public void parseXML(InputStream xml) {
        try {

            DOMResult result = new DOMResult();
            XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
            XMLEventReader reader = xmlInputFactory.createXMLEventReader(new StreamSource(xml));
            TransformerFactory transFactory = TransformerFactory.newInstance();
            Transformer transformer = transFactory.newTransformer();
            transformer.transform(new StAXSource(reader), result);
            Document document = (Document) result.getNode();

            NodeList startlist = document.getChildNodes();

            processNodeList(startlist);

        } catch (Exception e) {
            System.err.println("Something went wrong, this might help :\n" + e.getMessage());
        }
    }

Now all nodes from the document are in a NodeList so do this next :

private void processNodeList(NodeList nodelist) {
        for (int i = 0; i < nodelist.getLength(); i++) {
            if (nodelist.item(i).getNodeType() == Node.ELEMENT_NODE && (hasValidAttributes(nodelist.item(i)) || hasValidText(nodelist.item(i)))) {
                getNodeNamesAndValues(nodelist.item(i));
            }
            processNodeList(nodelist.item(i).getChildNodes());
        }
    }

Then for each element node with valid text get name and value

public void getNodeNamesAndValues(Node n) {

        String nodeValue = null;
        String nodeName = null;

        if (hasValidText(n)) {
            while (n != null && isWhiteSpace(n.getTextContent()) == true && StringUtils.isWhitespace(n.getTextContent()) && n.getNodeType() != Node.ELEMENT_NODE) {
                n = n.getFirstChild();
            }

            nodeValue = StringUtils.strip(n.getTextContent());
            nodeName = n.getLocalName();

            System.out.println(nodeName + " " + nodeValue);

        }
    }

Bunch of useful methods to check nodes :

private static boolean hasValidAttributes(Node node) {
        return (node.getAttributes().getLength() > 0);

    }

private boolean hasValidText(Node node) {
        String textValue = node.getTextContent();

        return (textValue != null && textValue != "" && isWhiteSpace(textValue) == false && !StringUtils.isWhitespace(textValue) && node.hasChildNodes());
    }

private boolean isWhiteSpace(String nodeText) {
        if (nodeText.startsWith("\r") || nodeText.startsWith("\t") || nodeText.startsWith("\n") || nodeText.startsWith(" "))
            return true;
        else
            return false;
    }

I also used StringUtils, you can get that by including this in your pom.xml if you're using maven :

<dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>2.5</version>
        </dependency>

This is inefficient if you're reading huge files, but not so much if you split them first. This is what I've come with(with google). There are more better solutions this is mine, I'm an amateur(for now).

c0mrade 2010-07-20 22:04:01

What is the point of using Stax if you process DOM model? :)

Georgy Bolyuba 2010-07-21 07:14:21

@Georgy Bolyuba I wouldn't know as I said I'm not a pro I googled found the stuff thats working.

c0mrade 2010-07-21 08:01:16

@Georgy Bolyuba there should be naming post option like don't do like this ..

c0mrade 2010-07-21 11:09:52

ansaurus

tags:

views:

answers:

How to get element only elements with values Stax

related questions