tags:

views:

657

answers:

4

Edit: This code is fine. I found a logic bug somewhere that doesn't exist in my pseudo code. I was blaming it on my lack of Java experience.

In the pseudo code below, I'm trying to parse the XML shown. A silly example maybe but my code was too large/specific for anyone to get any real value out of seeing it and learning from answers posted. So, this is more entertaining and hopefully others can learn from the answer as well as me.

I'm new to Java but an experienced C++ programmer which makes me believe my problem lies in my understanding of the Java language.

Problem: When the parser finishes, my Vector is full of uninitialized Cows. I create the Vector of Cows with a default capacity (which shouldn't effect it's "size" if it's anything like C++ STL Vector). When I print the contents of the Cow Vector out after the parse, it gives the right size of Vector but all the values appear never to have been set.

Info: I have successfully done this with other parsers that don't have Vector fields but in this case, I'd like to use a Vector to accumulate Cow properties.

MoreInfo: I can't use generics (Vector< Cow >) so please don't point me there. :)

Thanks in advance.

<pluralcow>
        <cow>
         <color>black</color>
         <age>1</age>
        </cow>
        <cow>
         <color>brown</color>
         <age>2</age>
        </cow>
        <cow>
         <color>blue</color>
         <age>3</age>
        </cow>
</pluralcow>

public class Handler extends DefaultHandler{
    // vector to store all the cow knowledge
    private Vector  m_CowVec;

    // temp variable to store cow knowledge until
    // we're ready to add it to the vector
    private Cow  m_WorkingCow;

    // flags to indicate when to look at char data
    private boolean m_bColor;
    private boolean m_bAge;

    public void startElement(...tag...)
    {
     if(tag == pluralcow){ // rule: there is only 1 pluralcow tag in the doc
                // I happen to magically know how many cows there are here.       
                m_CowVec = new Vector(numcows);
     }else if(tag == cow ){ // rule: multiple cow tags exist
      m_WorkingCow = new Cow();
     }else if(tag == color){ // rule: single color within cow
      m_bColor = true;
     }else if(tag == age){ // rule: single age within cow
      m_bAge = true;
     }
    }

    public void characters(...chars...)
    {
     if(m_bColor){
      m_WorkingCow.setColor(chars); 
     }else if(m_bAge){
      m_WorkingCow.setAge(chars);
     }
    }

    public void endElement(...tag...)
    {
     if(tag == pluralcow){
      // that's all the cows
     }else if(tag == cow ){
      m_CowVec.addElement(m_WorkingCow);  
     }else if(tag == color){
      m_bColor = false;
     }else if(tag == age){
      m_bAge = false;
     }
    }
}
A: 

The code looks fine to me. I say set breakpoints at the start of each function and watch it in the debugger or add some print statements. My gut tells me that either characters() is not being called or setColor() and setAge() don't work correctly, but that's just a guess.

Glomek
A: 

I have to say that I'm not a big fan of this design. However, are you sure that your characters is ever called ? (maybe a few system.outs would help). If it's never called, you would end up with an uninitialized cow.

Also, I would not try to implement an XML parser myself like this since you need to be more robust against validation issues.

You can use SAX or DOM4J, or even better, use Apache digester.

Uri
Do you have references for a SAX parser in Java? I'm writing for the Blackberry so small and fast are top priorities.
JR Lawhorne
Also, this is an override of org.xml.sax.helpers.DefaultHandler
JR Lawhorne
It looked like the standard parser but I wasn't sure if I it was the actual class or if it was an individual translation from the C++ SAX parser... I still think you need something that can handle messy XML unless you are sure that it is validated. Try looking at apache digester...
Uri
A: 

Also, if I have a schema I will use JaxB, or another code generator to speed up development of XML interface code. The code generators hide a lot of the complexity of working directly with SAX or DOM4J.

javelinBCD
+2  A: 

When you say that the Cows are uninitialized, are the String properties initialized to null? Or empty Strings?

I know you mentioned that this is pseudo-code, but I just wanted to point out a few potential problems:

public void startElement(...tag...)
    {
        if(tag == pluralcow){   // rule: there is only 1 pluralcow tag in the doc
                // I happen to magically know how many cows there are here.                     
                m_CowVec = new Vector(numcows);
        }else if(tag == cow ){  // rule: multiple cow tags exist
                m_WorkingCow = new Cow();
        }else if(tag == color){ // rule: single color within cow
                m_bColor = true;
        }else if(tag == age){   // rule: single age within cow
                m_bAge = true;
        }
    }

You really should be using tag.equals(...) instead of tag == ... here.

public void characters(...chars...)
{
    if(m_bColor){
            m_WorkingCow.setColor(chars);   
    }else if(m_bAge){
            m_WorkingCow.setAge(chars);
    }
}

I'm assuming you're aware of this, but this methods is actually called with a character buffer with start and end indexes.

Note also that characters(...) can be called multiple times for a single text block, returning small chunks in each call: http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)

"...SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks..."

I doubt you'll run into that problem in the simple example you provided, but you also mentioned that this is a simplified version of a more complex problem. If in your original problem, your XML consists of large text blocks, this is something to consider.

Finally, as others have mentioned, if you could, it's a good idea to consider an XML marshalling library (e.g., JAXB, Castor, JIBX, XMLBeans, XStream to name a few).

Jack Leow
Great Reply! 1) Char buffer: yes, I was simplifying in the pseudo code. 2) String.equals() : yes, also a simplification. 3) characters() called multiple times: awesome note! I'll have to be sure I handle that case. 4) Haven't found a marshalling library that works on Blackberry JDE 4 (J2ME) yet.
JR Lawhorne
Ah, I did not realize you were programming for J2ME. That's a little different then, and it's probably good for you to avoid that extra layer of abstraction, and the processing overhead that comes with it.
Jack Leow