views:

727

answers:

7

Hello,

I've recently been playing with code for an iPhone app to parse XML. Sticking to Cocoa, I decided to go with the NSXMLParser class. The app will be responsible for parsing 10,000+ "computers", all which contain 6 other strings of information. For my test, I've verified that the XML is around 900k-1MB in size.

My data model is to keep each computer in an NSDictionary hashed by a unique identifier. Each computer is also represented by a NSDictionary with the information. So at the end of the day, I end up with a NSDictionary containing 10k other NSDictionaries.

The problem I'm running into isn't about leaking memory or efficient data structure storage. When my parser is done, the total amount of allocated objects only does go up by about 1MB. The problem is that while the NSXMLParser is running, my object allocation is jumping up as much as 13MB. I could understand 2 (one for the object I'm creating and one for the raw NSData) plus a little room to work, but 13 seems a bit high. I can't imaging that NSXMLParser is that inefficient. Thoughts?

Code...

The code to start parsing...

NSXMLParser *parser = [[NSXMLParser alloc] initWithData: data];
[parser setDelegate:dictParser];
[parser parse];
output = [[dictParser returnDictionary] retain];        
[parser release];
[dictParser release];

And the parser's delegate code...

-(void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict {

    if(mutableString)
    {
        [mutableString release];
        mutableString = nil;

    }

    mutableString = [[NSMutableString alloc] init];     

}

-(void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string { 
    if(self.mutableString)
    {

        [self.mutableString appendString:string];

    }
}

-(void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {

    if([elementName isEqualToString:@"size"]){
        //The initial key, tells me how many computers
        returnDictionary = [[NSMutableDictionary alloc] initWithCapacity:[mutableString intValue]];
}

    if([elementName isEqualToString:hashBy]){
    //The unique identifier
        if(mutableDictionary){
            [mutableDictionary release];
            mutableDictionary = nil;
    }       

        mutableDictionary = [[NSMutableDictionary alloc] initWithCapacity:6];

        [returnDictionary setObject:[NSDictionary dictionaryWithDictionary:mutableDictionary] forKey:[NSMutableString stringWithString:mutableString]];
}

    if([fields containsObject:elementName]){
        //Any of the elements from a single computer that I am looking for
        [mutableDictionary setObject:mutableString forKey:elementName];
}
}

Everything initialized and released correctly. Again, I'm not getting errors or leaking. Just inefficient.

Thanks for any thoughts!

+3  A: 

Can't say anything specific about your code but take a look at Apple's XMLPerformance sample - it compares NSXMLParser and libxml performance - results are definitely in favour of the latter. In one of my projects switching from NSXMLParser to libxml gave a great performance boost, so I'd suggest using it.

Vladimir
Does libxml handel parsing over SSL? Just with a quick search I wasn't able to find much on it. If it can't then that's a deal breaker for me.
Staros
A: 

I've used NSXMLParser to parse XML files with around 500 records at 700K or so. I found this was at the upper end of the iPhone 3G memory limit. The memory expanded to much more than the size of the XML file, reaching 15MB at times. The problem was that I was storing the records in an array, so both were in memory at the same time. When parsing finished memory went down again, but if it ever reached 15 or 20MB, the app would crash. libxml is supposed to be much more memory efficient.

You might also try storing the created objects with Core Data instead of in an array. Core Data takes care of memory more by deallocating objects when they're not needed.

With my app, I reduced the memory overhead by optimizing other parts, so that the total memory used never reached the upper limit.

nevan
+3  A: 

NSXMLParser is a memory hog:

  1. it is not a real streaming parser: initWithURL: will download the full xml before processing it. For memory use this is bad as it have to allocate the memory for the full xml wich can’t be reclaimed until the end of parse. For performance it’s also bad, as you cannot interleave the IO intensive part of downloading and CPU intensive part of parsing.
  2. it will not release memory. It seems that strings/dictionaries created during the parsing is kept around until the end of parse. I’ve tried to improve it with creative use of NSAutoreleasePool but without any success.

Alternatives are libxml and AQXMLParser which is an NSXMLParser compatible wrapper around libxml, or ObjectiveXML.

See my blog article for more details.

mfazekas
A: 

If you want to know where your memory is going, run the code under Instruments using the ObjectAlloc template, and sort the class list by total size. Once the overall memory usage gets huge, you'll see one class or a few classes as the biggest occupier(s) of memory.

Then, drill down into one of these classes and examine the instances of it to see what created them.

Then you'll know, from evidence, where your problem lies.

Peter Hosey
A: 

Just switched over to libxml. Bit of a headache but the link Vladimir posted was a huge help. Now the bloat for a 900k - 1mb file is only around 2-3mb. Plus because it's a streaming parser, it's done almost immediately after the NSURLRequest returns.

Final answer - libxml.

Thanks for all your help guys!

Staros
A: 

If you looking for a replacement for NSXMLParser which can handle streaming of large XML documents over http you might be interested in my Expat Objective C Wrapper.

Ben Reeves
A: 

I've used AQXMLParser before, and it's definitely much more memory efficient than NSXMLParser.

TheSoundOfMatt