tags:

views:

118

answers:

5

Hi, This is a doubt in SAX. I want to process the children tags in a XML file,only if it matches the parent tag. For ex:

<version>
    <parent tag-1>
       <tag 1>
       <tag 2>
     </parent tag-1 >
     <parent tag-2>
       <tag 1>
       <tag 2>
     </parent tag-2>
</version>

In the above code, I want to match the parent tag first (i.e parent tag-1 or parent tag``-2,based on user input) and only then process the children tags under it. Can this be done in SAX parser, keeping in mind that SAX has limited control over DOM and that I am a novice in both SAX and Java? If so, could you please quote the corresponding method? TIA

+1  A: 

Surely, it can be done easily by remembering the parent tag.

In general, when parsing xml tags, people use stack to keep track of the family map of those tags. Your case can be solved easily with the following code:

Stack<Tag> tagStack = new Stack<Tag>();

public void startElement(String uri, String localName, String qName,
        Attributes attributes)
     if(localName.toLowerCase().equals("parent")){
          tagStack.push(new ParentTag());
     }else if(localName.toLowerCase().equals("tag")){
          if(tagStack.peek() instanceof ParentTag){
               //do your things here only when the parent tag is "parent"
          }
     }
}
public void endElement(String uri, String localName, String qName)
        throws SAXException{
     if(localName.toLowerCase().equals("parent")){
          tagStack.pop();
     }
}

Or you can simply remember you are in what tag by updating tagname:

String tagName = null;
public void startElement(String uri, String localName, String qName,
        Attributes attributes)
     if(localName.toLowerCase().equals("parent")){
          tagName = "parent";
     }else if(localName.toLowerCase().equals("tag")){
          if(tagName!= null && tagName.equals("parent")){
               //do your things here only when the parent tag is "parent"
          }
     }
}
public void endElement(String uri, String localName, String qName)
        throws SAXException{
     tagName = null;
}

But I prefer the stack way, because it keeps track of all your ancestor tags.

Winston Chen
@chen: This would still require traversing the entire xml file wouldn't it? Can I search and match the parent tag, and process the child tag only if a match happens?
fixxxer
Well. Once you kick off Sax, it is prepared to scan the whole document already. However, if you put codes into any one of the example that I provided to you, they will be run once the execution touches your codes. In other words, by the time your sax ends, whatever you want your codes do is done already.
Winston Chen
Okay.So SAX doesn't provide a way to jump to tags, does it?
fixxxer
Not as I am aware of. Sax walks along the document, trigger startElement, endElement, and other predefined methods to finish the tasks. Here is a very good tutorial: http://developerlife.com/tutorials/?p=29#50616
Winston Chen
Does any other parser provide a way to jump tags?
fixxxer
Thanks for the help!
fixxxer
Maybe Dom does, it parses the whole document into an inner data structure, and does operations on it. No matter what tool you use, parsing the entire document is a must. I am not aware of any tool that does operations without parsing. Hope this helps.
Winston Chen
You may be looking for XPath.
reinierpost
A: 

The SAX Parser will call a method in your implementation, every time it hits a tag. If you want different behavior depending on the parent, you have to save it to a variable.

Tim Büthe
+1  A: 

SAX is going to spool through the entire document anyway, if you're looking at doing this for performance reasons.

However, from a code niceness perspective, you could have the SAX parser not return the non-matching children, by wiring it up with an XMLFilter. You'd probably still have to write the logic yourself - something like that provided in Wing C. Chen's post - but instead of putting it on your application logic you could abstract it out into a filter implementation.

This would let you reuse the filtering logic more easily, and it would probably make your application code cleaner and easier to follow.

Andrzej Doyle
A: 

If you want to jump to particular tags then you would need to use a DOM parser. This will read the entire document into memory and then provide various ways of accessing particular nodes of the tree, such as requesting a tag by name then asking for the children of that tag.

So if you are not restricted to SAX then I would recommend DOM. I think the main reason for using SAX over DOM is that DOM requires more memory since the entire document is loaded at once.

DaveJohnston
It is actually a 200 line long XML file.And since any parser will have to go through it atleast once and I just need to match one child tag and obtain its attribute, I think I'll go ahead with SAX. Thanks for dropping in!
fixxxer
+1  A: 

The solution proposed by @Wing C. Chen is more than decent, but in your case, I wouldn't use a stack.

A use case for a stack when parsing XML

A common use case for a stack and XML is for example verifying that XML tags are balanced, when using your own lexer(i.e. hand made XML parser with error tolerance).

A concrete example of it would be building the outline of an XML document for the Eclipse IDE.

When to use SAX, Pull parsers and alike

  • Memory efficiency when parsing a huge XML file

  • You don't need to navigate back and forth in the document.

However Using SAX to parse complex documents can become tedious, especially if you want to apply operations to nodes based on some conditions.

When to use DOM like APis

  • You want easy access to the nodes

  • You want to navigate back and forth in the document at any time

  • Speed is not the main requirement vs development time/readability/maintenance

My recommendation

If you don't have a huge XML, use a DOM like API and select the nodes with XPath. I prefer Dom4J personally, but I don't mind other APis such as JDom or even Xpp3 which has XPath support.

John Doe
That was helpful, but I guess I dont need to save anything and so intend to ahead with the SAX parser.Thank you for dropping in.
fixxxer