views:

949

answers:

3

Hello,

I am trying to find the fasted way to set a specific property of every item in a generic list.

Basicly the requirement is to iterate over a list of items and resetting the IsHit property to FALSE. Only the items in a second "hit"-list should be set to TRUE afterwards.

My first attempt looked like this:

listItems.ForEach(delegate(Item i) { i.IsHit = false; });

foreach (int hitIndex in hits)
{
    listItems[hitIndex - 1].IsHit = true;
}

Note: hits is 1-based, the items list is 0-based.

Then i tried to improve the speed and came up with this:

for (int i = 0; i < listItems.Count; i++)
{
    bool hit = false;
    for (int j = 0; j < hits.Count; j++)
    {
        if (i == hits[j] - 1)
        {
            hit = true;
            hits.RemoveAt(j);
            break;
        }
    }

    if (hit)
    {
        this.listItems[i].IsHit = true;
    }
    else
    {
        this.listItems[i].IsHit = false;
    }
}

I know this is a micro optimization but it is really time sensitive code, so it would make sense to improve this code beyond readibility... and just for fun of course ;-)

Unfortuanetly I don't really see any way to improve the code further. But I probably missed something.



Thanks

PS: Code in C# / .NET 2.0 would be preferred.


I ended up switching to Eamon Nerbonne solution. But then I noticed something weird in my benchmarks.

The delegate:

listItems.ForEach(delegate(Item i) { i.IsHit = false; });

is faster than:

foreach (Item i in listItems)
{
    i.IsHit = false;
}

How is that possible?

I tried to look at IL but thats just way over my head... I only see that the delegates results in fewer lines, whatever that means.

+3  A: 

Can you put the items of your second list in a dictionary ? If so, you can do this:

for( int i = 0; i < firstList.Count; i++ )
{
   firstList[i].IsHit = false;

   if( secondList.Contains (firstList[i].Id) )
   {
       secondList.Remove (firstList[i].Id);
       firstList[i].IsHit = true;
   }
}

Where secondList is a Dictionary offcourse.

By putting the items of your histlist in a Dictionary, you can check with an O(1) operation if an item is contained in that list. In the code above, I use some kind of unique identifier of an Item as the Key in the dictionary.

Frederik Gheysels
Instead of dictionary you could use HashSet<T>. Also, there is no need to remove hits from the dictionary/set if you are sure that all hits will be "used" - just empty the collection after the loop.
maciejkow
Yes, you're right about the HashSet collection. I sometimes forget about its existence though :o.About removing items from the set or dictionary: I've just done it because I saw the topicstarter was doing it as well in his example code. Maybe he has another reason for it, but it is indeed not necessary to do this when using a set or dictionary performancewise.
Frederik Gheysels
`HashSet<>` and `Dictionary<>` are a lot slower than plain old array's - and since you're dealing with dense integer indexes, hashsets and dictionaries have no meaningful additional functionality here.
Eamon Nerbonne
Well, seems like there are no reputation points in courtesy...at least an upvote on the comment would have been nice.
Sven Hecht
A: 

You could maybe sort the hits collection and perform a binary search, then you would be O(n log2 n) instead of O(n2)

veggerby
+1  A: 

A nested for-loop is overkill, and in particular, the "remove" call itself represents yet another for-loop. All in all, your second optimized version has a worse time-complexity than the first solution, in particular when there are many hits.

The fastest solution is likely to be the following:

foreach(var item in listItems)
    item.IsHit = false;
foreach (int hitIndex in hits)
    listItems[hitIndex - 1].IsHit = true;

This avoids the inefficient nested for-loops, and it avoids the overhead of the delegate based .ForEach method (which is a fine method, but not in performance critical code). It involves setting IsHit slightly more frequently, but most property setters are trivial and thus this is probably not a bottleneck. A quick micro-benchmark serves as a fine sanity check in any case.

Only if IsHit is truly slow, the following will be quicker:

bool[] isHit = new bool[listItems.Count]; //default:false.
//BitArray isHit = new BitArray(listItems.Count);  
//BitArray is potentially faster for very large lists.
foreach (int hitIndex in hits)
    isHit [hitIndex - 1] = true;
for(int i=0; i < listItems.Count; i++)
    listItems[i].IsHit = isHit[i];

Finally, consider using an array rather than a List<>. Arrays are generally faster if you can avoid needing the List<> type's insertion/removal methods.

The var keyword is C# 3.5 but can be used in .NET 2.0 (new language features don't require newer library versions, in general - it's just that they're most useful with those newer libs). Of course, you know the type with which List<> is specialized, and can explicitly specify it.

Eamon Nerbonne
Minor note on the complexity of the OP's second version: the Remove call will iterate precisely over those elements of hits *not* iterated over by the explicit inner for loop, so the overal complexity of that solution is on the order of `listItems.Count` × `hits.Count`
Eamon Nerbonne