views:

66

answers:

2

Assume I am running either of the code snippets below for a list of 1000 Event entries (in allEventsToAggregate). Would I see a performance improvement in the first implementation if the events in allEventsToAggregate are sorted by customerId, with each customer having roughly 3 events? This is essentially a question of string comparison vs. HashMap lookup performance.

Option 1:

Map<String, List<Event>> eventsByCust = new HashMap<String, List<Event>>();
List<Event> thisCustEntries;
String lastCust = null;
for (Event thisEvent : allEventsToAggregate) {
    if (!thisEvent.getCustomerId().equals(lastCust)) {
        thisCustEntries = eventsByCust.get(thisEvent.getCustomerId());
        if (thisCustEntries == null) {
            thisCustEntries = new ArrayList<Event>();
        }
    }
    thisCustEntries.add(thisEvent);
    eventsByCust.put(thisEvent.getCustomerId(), thisCustEntries);
    lastCust = thisEvent.getCustomerId();
}

Option 2:

Map<String, List<Event>> eventsByCust = new HashMap<String, List<Event>>();
for (Event thisEvent : allEventsToAggregate) {
    List<Event> thisCustEntries = eventsByCust.get(thisEvent.getCustomerId());
    if (thisCustEntries == null) {
        thisCustEntries = new ArrayList<Event>();
    }
    thisCustEntries.add(thisEvent);
}
+2  A: 

Would I see a performance improvement

Almost certainly not. Unless this block represents a critical inner loop of your application, any marginal performance gains will almost certainly be unnoticeable.

Consequently, I would go with the second version of the code, as its a clearer expression of your intent and so will be easier to maintain (as well as being slightly less prone to subtle bugs in the first place). Maintainability almost certainly trumps making the application 0.001% faster.

Andrzej Doyle
That is my thought as well. Just for curiosity's sake, I do wonder at which point it would matter. What if the chunks of customers Events were around 1000 each, and my total records were 1 million?
pkananen
@pkananen: The point at which it matters, is the point at which profiling the application shows that it spends a non-negligible amount of time in this particular chunk of code, and you 1) need to speed things up, and 2) can't get as much "bang for your buck" optimising any other hotspots. ;-)
Andrzej Doyle
Yes, I agree. This was more of a theoretical question.
pkananen
+1  A: 

1) Remember that a successful retrieval of an item from a HashMap requires a string compare to confirm that you really have found the correct item.

2) We seem to be talking about very small differences in execution time, not real algorithmic improvements. Is it really worth losing readability for this?

3) For small differences, the only way to really know will be to actually time the thing in practice - in fact not only to run a comparison, but to organise it as a fully fledged scientific experiment. There is just too much too to worry about these days about what your compiler and run time system has chosen to optimise, what cpu caching or VM page faulting means, and what Java garbage collection thinks of your algorithm. Then, of course, you may well find that you get different answers for different versions of Java or on hardware with different cpus, motherboards, or memory sizes, or even how long the system has been running and so how much time it has had to migrate its disk contents into memory cache, or JIT-compile relevant bits of Java, or whatever.

mcdowella