views:

147

answers:

2

Hi,

I am currently writing an XML parser that parses a lot of data, with a lot of different nodes (the XML isn't designed by me, and I have no control over the content...)

Anyway, it currently takes an unacceptably long time to download and read in (about 13 seconds) and so I'm looking for ways to increase the efficiency of the read.

I've written a function to create hash values, so that the program no longer has to do a lot of string comparison (just NSUInteger comparison), but this still isn't reducing the complexity of the read in...

So I thought maybe I could create an array of IMPs so that, I could then go something like:

for(int i = 0; i < [hashValues count]; i ++)
{
    if(currHash == [[hashValues objectAtIndex:i] unsignedIntValue])
    {
        [impArray objectAtIndex:i];
    }   
}

Or something like that.

The only problem is that I don't know how to actually make the call to the IMP function?

I've read that I perform the selector that an IMP defines by going

IMP tImp = [impArray objectAtIndex:i];
tImp(self, @selector(methodName));

But, if I need to know the name of the selector anyway, what's the point?

Can anybody help me out with what I want to do? Or even just some more ways to increase the efficiency of the parser...

Here are some excerpts from my NSXMLParser Delegate: From didStartElement

if([elementName isEqualToString:@"playingFilmData"])
{
    appDelegate.arrPlayingFilms = [[NSMutableArray alloc] init];
appDelegate.arrSessionTimes_ByFilm = [[NSMutableArray alloc] init];
appDelegate.arrSessionTimes_ByCinema = [[NSMutableArray alloc] init];
[self releaseData];
return;
}
else if([elementName isEqualToString:@"film_sessions"])
{
    aFilm.arrSessions = [[NSMutableArray alloc] init];
    [self releaseData];
    return;
}
else if([elementName isEqualToString:@"session"])
{
    aSession = [[ATM_SessionObject alloc] init];
    aSession.session_filmID = aFilm.film_id;
    [self releaseData];
    return;
}
else if([elementName isEqualToString:@"sess"])
{
    aFilm.arrSessions = [[NSMutableArray alloc] init];
    [self releaseData];
    return;
}
else if([elementName isEqualToString:@"cin"])
{
    cinID = [attributeDict objectForKey:@"id"];
    [self releaseData];
    return;
}
else if([elementName isEqualToString:@"s"])
{
    aSession = [[ATM_SessionObject alloc] init];
    aSession.session_filmID = aFilm.film_id;
    aSession.session_cinemaID = cinID;
    [self releaseData];
    return;
}
else if([elementName isEqualToString:@"flm"])
{
    aFilm = [[ATM_FilmObject alloc] init];
    aFilm.film_id = [attributeDict objectForKey:@"id"];
    aFilm.film_epNum = 0;

    [self releaseData];
    return;
}

[self releaseData];

From didEndElement

/*
 *0 = nowShowing_lastUpdate
 *1 = s
 *2 = tit
 *3 = des
 *4 = rate
 *5 = dir
 *6 = act
 *7 = rel
 *8 = flm
 */

NSUInteger numHash = [appDelegate murmerHashKey:elementName WithLegth:[elementName length] AndSeed:42];

if(currentElementValue)
{
if(numHash == [[hashValues objectAtIndex:0] unsignedIntValue])
{
    appDelegate.strNowShowingUpdate = currentElementValue;

    self releaseData];
    return;
}
else if(numHash == [[hashValues objectAtIndex:1] unsignedIntValue])
{
    [aFilm.arrSessions addObject:aSession];
    [appDelegate.arrSessionTimes_ByFilm addObject:aSession];

    [aSession release];
    aSession = nil;
}
else if(numHash == [[hashValues objectAtIndex:2] unsignedIntValue])
{
    [aFilm setValue:currentElementValue forKey:@"film_title"];

    [self releaseData];
    return;
}
else if(numHash == [[hashValues objectAtIndex:3] unsignedIntValue])
{
    [aFilm setValue:currentElementValue forKey:@"film_description"];

    [self releaseData];
    return;
}
else if(numHash == [[hashValues objectAtIndex:4] unsignedIntValue])
{
    [aFilm setValue:currentElementValue forKey:@"film_rating"];

    [self releaseData];
    return;
}
else if(numHash == [[hashValues objectAtIndex:5] unsignedIntValue])
{
    [aFilm setValue:currentElementValue forKey:@"film_directors"];

    [self releaseData];
    return;
}
else if(numHash == [[hashValues objectAtIndex:6] unsignedIntValue])
{
    [aFilm setValue:currentElementValue forKey:@"film_actors"];

    [self releaseData];
    return;
}               
}

if(numHash == [[hashValues objectAtIndex:8] unsignedIntValue])
{
[appDelegate.arrPlayingFilms addObject:aFilm];

[aFilm release];
aFilm = nil;

[self releaseData];
return;
}

[self releaseData];

I hope this helps shed some more light on what I'm doing wrong. Like I said, I'm new to this area of programming (and really, I'm actually a mathematician, not a programmer by training...), so I'm really super enthusiastic to learn not what to do!!

+3  A: 

You're micro-optimizing without giving an overview of what the whole problem is about.

Are you scanning (SAX) the XML or traversing a DOM structure? Are there memory issues? Even when SAX-parsing XML and you have no NSAutoreleasePools in place, you could allocate a lot of memory.

I don't believe that objc method dispatching is the source of your performance problem. You should use Shark to identify the bottleneck. The parsing itself surely is not the problem: The linked 1,4MB XML file takes 0.1 sec to run through xmllint -format

If you want more help, you’ve got to describe more of what you’re doing: type of parser, what data or objects are you producing, more code.

Nikolai Ruhe
Agreed that the xml is not the problem. Running it through an `NSXMLParser` only takes about 0.2 seconds.
Dave DeLong
Thanks :)I subclassed NSXMLParser, and so I assume that it's traversing it (although I'm not entirely sure, as this particular area is kind of new to me).I initially thought that perhaps the many many many string comparisons that was cause of the issue, however using the hash function didn't speed things up at all. I'll look at what is happening using Shark!Failing anything from that, I'll update my question with further source :)
Dwaine Bailey
@Dwaine why did you subclass `NSXMLParser`? You don't need to. You only have to provide it with a custom delegate object.
Dave DeLong
@Dave that's what I meant (that I created a custom delegate). I'm not having a good day/week for programming *or* words, it would seem!My delegate *did* just do string comparison to determine whether or not the elementName matched an expected value, and if it did then it performed the desired action (creating an array, creating an object, adding an object to an array and releasing the temporary holder, or setting a value in an object.)Now it does the exact same thing, but using NSUInteger comparisons instead of string comparisons, thanks to the hash function (which should be faster?)
Dwaine Bailey
@Dave And Shark is apparently not playing nicely, and refuses to actually *see* anything (can't find symbols apparently, even though they are being created, and are not being stripped...)
Dwaine Bailey
+1  A: 

There is a saying:

Premature optimization is the root of all evil.

If you need to compare an element name to an expected value, you will have to perform a character by character string compare at some point. You can eliminate some definite not equals cases by comparing hashes first, but don't forget, calculating a hash also has a cost. And anyway, do you think Apple didn't already think of these optimizations when implementing isEqualToString:?

I have done some profiling of Objective-C applications using Shark and I have found that, in extreme cases, the overhead of obj_message_send can be as much as 20-25%. So hypothetically, if you eliminated every single message send, your 13 seconds could come down to 10 seconds. Is that good enough? I doubt it.

Consider also what is going on inside NSXMLParser. It'll be doing string comparisons all the time in order to parse the actual XML. Compared with what it has to do, your string compares are probably totally insignificant. You absolutely need to profile your code to find out where best to direct your optimization efforts. If it turns out that 12 of the 13 seconds are spent resolving the IP address of the host from which you are downloading the XML, nothing you do to your code is going to help.

JeremyP