tags:

views:

135

answers:

4

Take a simple XML file formatted like this:

<Lists>
<List>
<Note/>
...
<Note/>
</List>
<List>
<Note/>
...
<Note/>
</List>
</Lists>

Each node has some attributes that actually hold the data of the file. I need a very quick way to count the number of each type of element, (List and Note). Lists is simply the root and doesn't matter.

I can do this with a simple string search or something similar, but I need to make this as fast as possible.

Design Parameters:
Must be in java (Android application).
Must AVOID allocating memory as much as possible.
Must return the total number of Note elements and the number of List elements in the file, regardless of location in file.

Number of Lists will typically be small (1-4), and number of notes can potentially be very large (upwards of 1000, typically 100) per file.

I look forward to your suggestions.

A: 

Look at implementing a org.xml.sax.ContentHandler and sending it to an org.xml.sax.XMLReader.

These classes are bundled with Android SDK. This is a 'forward parser' approach which involves your ContentHandler being shown each XML element (tag, attribute, text) as the document is processed from start to end. The forward parser approach is light on memory use and a lot faster than building a DOM.

Jim Blackler
Sax takes too long because it parses everything. I only need to count the nodes, a more efficient method must be out there.
CodeFusionMobile
+1  A: 

If you just want to count the elements in the text rather than parsing the document, you can read each line from the file in sequence and check using the Pattern/Matcher class (I forget which) whether the line matches "<Note>" or "<List>" and increment the counters respectively.

EDIT: Alternative idea

Read through the document one character at a time, when you encounter a "<" character, start appending all subsequent characters that are not a ">" character to a StringBuilder. Then when you encounter a ">" symbol, compare the StringBuilder string to "Note" or "List" or whatever and increment counters accordingly. Finally, clear the StringBuilder and repeat until the end of the document.

Moonshield
You mean use regular expressions? Those are EXTREMELY expensive on Android.
CodeFusionMobile
Doesn't necessarily need regex, if the tags are on their own lines then you can trim the whitespace from the line and just use the String.equals() method.
Moonshield
Those string methods require memory alloc, which again is important to avoid in mobile design. Some memory will be allocated, but it should avoid having an O(n) relationship if possible.
CodeFusionMobile
A: 

quick dirty untested solution, using a generated state machine by Ragel. Feed this to ragel, which will generate java code for you.

The resulting code will use a table based FSM parser with a constant memory requirement (the tables and state variable). It can also accept partial data, you can resume it in any position.

This will probably be faster than any general purpose parser, or the system's regular expressions.

(Disclaimer: I am not a Java programmer, nor is the below complete in any way as it is missing the skeleton code needed to run. However, it may well be a decent basis to start from.)

%%{
    machine nodecounter;

    note = '<Note' @{notes++;};
    list = '<List' ^'s' @{lists++;};
    set = note | list;
    main := (set | ^set)*;
}%%

%% write data;

%% write init;

/* */
%% write exec;
Hasturkun
A: 

XmlPullParser is a streaming pull XML parser and should be used when there is a need to process quickly and efficiently all input elements.

You can try something like this:

private void pullParserSample(FileInputStream xml) {
    int lists = 0;
    int notes = 0;
    int eventType = -1;

    try {
        XmlPullParser xpp = XmlPullParserFactory.newInstance().newPullParser();
        xpp.setInput(new InputStreamReader(xml));

        eventType = xpp.getEventType();

        do {
            switch ( eventType ) {

            case XmlPullParser.START_TAG:
                final String tag = xpp.getName();
                if ( "Note".equals(tag) ) {
                    notes++;
                }
                else if ( "List".equals(tag) ) {
                    lists++;
                }
                break;

            }

        } while ((eventType = xpp.next()) != XmlPullParser.END_DOCUMENT) ;

    } catch (XmlPullParserException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    Log.d(TAG, "lists=" + lists + " notes=" + notes);
}
dtmilano