ansaurus

Question

How can I make my function run as fast as "Contains" on an ArrayList?

Answer 1

+5 A:

As you're using .NET 3.5, why are you using ArrayList to start with, rather than List<string>?

A few things to try:

You could see whether using foreach instead of a for loop helps

You could cache the count:

public bool DoesContain(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = 0; i < count; i++)
    {
        if (list[i].Equals(element))
        {
            return true;
        }
        return false;
    }
}

You could reverse the comparison:
```
if (element.Equals(list[i]))
```

While I don't expect any of these to make a significant (positive) difference, they're the next things I'd try.

Do you need to do this containment test more than once? If so, you might want to build a HashSet<T> and use that repeatedly.

Jon Skeet 2010-02-25 20:34:44

Or just `Equals(element, list[i])` :)

leppie 2010-02-25 20:39:19

Isn’t caching `Count` rather counter-productive vis-à-vis bounds checks? I was under the impression that the JIT would elide these if it recognizes a loop over a range. And if it *also* starts inlining `Count`, hey, we’ve got a deal.

Konrad Rudolph 2010-02-25 20:40:29

I suspect that using a `foreach` will have the biggest impact, since you then you never get the count at all, and avoid the cost of calculating `list[i]` every time through the loop.

JSBangs 2010-02-25 20:41:05

Using a foreach does have the biggest impact. Negative impact, that is. It's twice as slow. However, caching the Count and reversing the comparison does help a bit, but still it doesn't account but for a small part of the difference.

luvieere 2010-02-25 20:42:51

Foreach is fast for raw arrays. For arraylists, the extra layer of indirection is a killer.

Brian 2010-02-25 20:44:41

Caching the count will be increasingly helpful as the arrays increase in size. Also, the code as written will return false unless `element` is the first item in the list (`return false` should be outside the second closing brace, so copying this code to try it will fail).

ehdv 2010-02-25 20:49:53

@luvieere/Brian: Yes, I'd expect foreach to be slower. However, performance expectations are often incorrect, which is why I'd *try* it. @Konrad: I believe the JIT optimises for Count on arrays, but possibly not for `List<T>`. Again, I have no particular expectation, but I think each of these is at least worth trying.

Jon Skeet 2010-02-25 20:50:12

@Jon I did try it, trust me I'm quite curious as to the reasons of my conundrum and I'm testing every idea that I find in these answers.

luvieere 2010-02-25 20:54:56

Answer 2

A:

Revised after reading comments:

It does not use some Hash-alogorithm to enable fast lookup.

Arve 2010-02-25 20:35:52

No, it doesn't - as indicated by "The documentation states that Contains performs a linear search"

Jon Skeet 2010-02-25 20:37:38

How do you figure that out?

Konrad Rudolph 2010-02-25 20:38:35

According to the documentation, it is performing a linear search using the `Equals` and `CompareTo` methods of each element in the `ArrayList`.

William Brendel 2010-02-25 20:39:45

Answer 3

+2 A:

I'm not sure if you're allowed to post Reflector code, but if you open the method using Reflector, you can see that's it's essentially the same (there are some optimizations for null values, but your test harness doesn't include nulls).

The only difference that I can see is that calling list[i] does bounds checking on i whereas the Contains method does not.

Greg 2010-02-25 20:38:23

Answer 4

A:

My guess would be that ArrayList is written in C++ and could be taking advantage of some micro-optimizations (note: this is a guess).

For instance, in C++ you can use pointer arithmetic (specifically incrementing a pointer to iterate an array) to be faster than using an index.

tster 2010-02-25 20:39:30

You can use [reflector](http://www.red-gate.com/products/reflector/) to decompile ArrayList and see all it's source code. It's not written in C++, it's written in a regular CLR based language. My guess is an early edition of C#.

Bevan 2010-02-25 20:48:31

Answer 5

+1 A:

With a really good optimizer there should not be difference at all, because the semantics seems to be the same. However the existing optimizer can optimize your function not so good as the hardcoded Contains is optimized. Some of the points for optimization:

comparing to a property each time can be slower than counting downwards and comparing against 0
function call itself has its performance penalty
using iterators instead of explicit indexing can be faster (foreach loop instead of plain for)

Vlad 2010-02-25 20:39:45

It looks like counting downwards and comparing against 0 did the trick. The time is down to 00:00:00.1416727 for my function. Thanks!

luvieere 2010-02-25 20:56:34

Niiice! Frankly speaking, I didn't expect this old-school optimization to be still working.

Vlad 2010-02-25 21:02:46

Answer 6

+11 A:

First, you're not running it many times and comparing averages.

Second, your method isn't being jitted until it actually runs. So the just in time compile time is added into its execution time.

A true test would run each multiple times and average the results (any number of things could cause one or the other to be slower for run X out of a total of Y), and your assemblies should be pre-jitted using ngen.exe.

Will 2010-02-25 20:39:47

Actually, I would say this is the right answer. The actual implementation is hardly different.

leppie 2010-02-25 20:45:06

+1 for running it many times. If it were the difference between 187 and 108 seconds, *maybe* fewer runs would be fine; but with a difference of 79 milliseconds, you need more data points to make a solid conclusion.

Austin Salonen 2010-02-25 20:59:08

Answer 7

A:

Use SortedList<TKey,TValue>, Dictionary<TKey, TValue> or System.Collections.ObjectModel.KeyedCollection<TKey, TValue> for fast access based on a key.

var list = new List<myObject>(); // Search is sequential
var dictionary = new Dictionary<myObject, myObject>(); // key based lookup, but no sequential lookup, Contains fast
var sortedList = new SortedList<myObject, myObject>(); // key based and sequential lookup, Contains fast

KeyedCollection<TKey, TValue> is also fast and allows indexed lookup, however, it needs to be inherited as it is abstract. Therefore, you need a specific collection. However, with the following you can create a generic KeyedCollection.

public class GenericKeyedCollection<TKey, TValue> : KeyedCollection<TKey, TValue> {
   public GenericKeyedCollection(Func<TValue, TKey> keyExtractor) {
      this.keyExtractor = keyExtractor;
   }

   private Func<TValue, TKey> keyExtractor;

   protected override TKey GetKeyForItem(TValue value) {
      return this.keyExtractor(value);
   }
}

The advantage of using the KeyedCollection is that the Add method does not require that a key is specified.

Obalix 2010-02-25 20:40:27

Answer 8

A:

Actually Big-O notation is supposed to take the worst case scenario for time, not the best. So therefore I cannot see what the issue is in regards to this information. As far as I can see it works as anticipated since it did not take longer than your function :)

Woot4Moo 2010-02-25 20:41:05

Answer 9

+1 A:

First, if you are using types you know ahead of time, I'd suggest using generics. So List instead of ArrayList. Underneath the hood, ArrayList.Contains actually does a bit more than what you are doing. The following is from reflector:

public virtual bool Contains(object item)
{
    if (item == null)
    {
        for (int j = 0; j < this._size; j++)
        {
            if (this._items[j] == null)
            {
                return true;
            }
        }
        return false;
    }
    for (int i = 0; i < this._size; i++)
    {
        if ((this._items[i] != null) && this._items[i].Equals(item))
        {
            return true;
        }
    }
    return false;
}

Notice that it forks itself on being passed a null value for item. However, since all the values in your example are not null, the additional check on null at the beginning and in the second loop should in theory take longer.

Are you positive you are dealing with fully compiled code? I.e., when your code runs the first time it gets JIT compiled where as the framework is obviously already compiled.

Thomas 2010-02-25 20:42:35

He's also calling ArrayList.Item[Int32] though, which does two integer checks `if ((index < 0) || (index >= this._size))`

Greg 2010-02-25 21:21:24

Answer 10

A:

using an array structure, you can't search faster than O(n) whithout any additional information. if you know that the array is sorted, then you can use binary search algorithm and spent only o(log(n)) otherwise you should use a set.

antony 2010-02-25 20:45:24

sorry, my post is not applicable to the issue

antony 2010-02-25 21:47:31

Answer 11

+1 A:

Using the code below I was able to get the following timings relatively consitently (within a few ms):
1: 190ms DoesContainRev
2: 198ms DoesContainRev1
3: 188ms DoesContainFwd
4: 203ms DoesContainFwd1
5: 199ms Contains

Several things to notice here.

This is run with release compiled code from the commandline. Many people make the mistake of benchmarking code inside the Visual Studio debugging environment, not to say anyone here did but something to be careful of.

The list[i].Equals(element) appears to be just a bit slower than element.Equals(list[i]).

using System;
using System.Diagnostics;
using System.Collections;


namespace ArrayListBenchmark
{
class Program
{
    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        const int arrayCount = 10000000;
        ArrayList list = new ArrayList(arrayCount);
        for (int i = 0; i < arrayCount; i++) list.Add("zzz " + i);
    sw.Start();
    DoesContainRev(list, "zzz");
    sw.Stop();
    Console.WriteLine(String.Format("1: {0}", sw.ElapsedMilliseconds));
    sw.Reset();


sw.Start();
DoesContainRev1(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("2: {0}", sw.ElapsedMilliseconds));
sw.Reset();


sw.Start();
DoesContainFwd(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("3: {0}", sw.ElapsedMilliseconds));
sw.Reset();


sw.Start();
DoesContainFwd1(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("4: {0}", sw.ElapsedMilliseconds));
sw.Reset();


sw.Start();
list.Contains("zzz");
sw.Stop();
Console.WriteLine(String.Format("5: {0}", sw.ElapsedMilliseconds));
sw.Reset();


Console.ReadKey();


}
public static bool DoesContainRev(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = count - 1; i >= 0; i--)
        if (element.Equals(list[i])) return true;


return false;


}
public static bool DoesContainFwd(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = 0; i < count; i++)
        if (element.Equals(list[i])) return true;


return false;


}
public static bool DoesContainRev1(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = count - 1; i >= 0; i--)
        if (list[i].Equals(element)) return true;


return false;


}
public static bool DoesContainFwd1(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = 0; i < count; i++)
        if (list[i].Equals(element)) return true;


return false;


}
         }
        }

Firestrand 2010-02-26 03:51:51

The loop for(int i = 0; i = 0; i--) wouldn't even compile?

Fadrian Sudaman 2010-02-26 05:39:38

Yup, it was the debugging environment. I still get better times for `Count` but the difference is negligible.

luvieere 2010-02-26 05:41:30

I'm not sure where there is a `for(int i = 0; i = 0; i--)` I believe all the code in my example is correct with `(int i = count - 1; i >= 0; i--)` sorry if I missed something.

Firestrand 2010-02-26 18:58:42

ansaurus

tags:

views:

answers:

How can I make my function run as fast as "Contains" on an ArrayList?

related questions