views:

130

answers:

4
<xml>
<Office prop1="prop1" prop2="prop2">
    <Version major="1" minor="0"/>
    <Label>MyObjectA</Label>
    <Active>No</Active>
</Office>
<Vehicle prop="prop">
    <Wheels>4</Wheels>
    <Brand>Honda</Brand>
    <Bought>No</Bought>
</Vehicle>
</xml>

My XML is in this format. I am using a SAX parser to parse this file as the size of the xml file can be large.

What pattern should I follow to parse the file.

Usually I have been following this approach:

//PseudoCode
if(start){
    if(type Office)
    {
       create an instance of type Office and populate the attributes of Office in the Office class using a call back
    }
    if(type Vehicle)
    {
       create an instance of type Vehicle and populate the attributes of Vehicle in the Vehicle class using a call back
     }
}

if(end){
     // do cleaning up
}

This approach usually makes my parsing function containing start and end tag to be huge. Is there any other better approach which can be followed.

A: 

You could create a lookup table from type to parse action, and then you just need to index into your lookup table to find the appropriate parse action.

Dominic Rodger
+2  A: 

I had good experience with this approach:

  1. Create lookup table to map node names to handler functions. You'll most likely need to maintain two handlers per node name, one for the beginning and one for the end tag.
  2. Maintain a stack of the parent nodes.
  3. Call the handler from the lookup table.
  4. Each handler function can do its tasks without further checks. But if necessary each handler can also determine the current context by looking at the parent node stack. That becomes important if you have nodes with the same name at different places in the node hierarchy.

Some pseudo-Java code:

public class MyHandler extends DefaultHandler {

private Map<String, MyCallbackAdapter> startLookup = new HashMap<String, MyCallbackAdapter>();
private Map<String, MyCallbackAdapter> endLookup = new HashMap<String, MyCallbackAdapter>();
private Stack<String> nodeStack = new Stack<String>();

public MyHandler() {
   // Initialize the lookup tables
   startLookup.put("Office", new MyCallbackAdapter() { 
      public void execute() { myOfficeStart() } 
    });

   endLookup.put("Office", new MyCallbackAdapter() { 
      public void execute() { myOfficeEnd() } 
    });
}

public void startElement(String namespaceURI, String localName,
        String qName, Attributes atts) {
  nodeStack.push(localName);

  MyCallbackAdapter callback = startLookup.get(localName);
  if (callback != null)
    callback.execute();
}

public void endElement(String namespaceURI, String localName, String qName)

  MyCallbackAdapter callback = endLookup.get(localName);
  if (callback != null)
    callback.execute();

  nodeStack.pop();
}

private void myOfficeStart() {
  // Do the stuff necessary for the "Office" start tag
}

private void myOfficeEnd() {
  // Do the stuff necessary for the "Office" end tag
}

//...

}

General advice: Depending on your requirements you might need further contextual information, like the previous node name or if the current node is empty. If you find yourself adding more and more contextual information, you might consider switching to a full fletched DOM parser, unless runtime speed is more important than developing speed.

DR
Can you provide more details. I think this is what I should be doing
Devil Jin
I added some pseudo-code, before realizing you were asking specifically about Java. I'll write some more Java-like code...
DR
this gives me a new insight of using data structures for our purpose.thanks DR
Devil Jin
+1  A: 

If you want to stick with the explicit SAX approach, DR's answer makes sense. I've used this approach in the past with success.

However you may want to take a look at Commons Digester, which allows you to specify an object to be created/populated for subtrees of an XML document. It's a very easy way to build an object hierarchy from XML without using the SAX model explicitly.

See this ONJava article for more info.

Brian Agnew
i was wondering what method could we use to solve this problem. DR has provided a method i just didnt thought of before.
Devil Jin
A: 

You need a lexical analyer, the Interpreter Pattern is the ideal patter for writing a lexical analyser.

Martin Spamer