views:

149

answers:

3

I would like to ask one interested (for me) question.

What collection is the best by criteria performance if collection contains a lot of items (more than 1 million).

By example, I create simple List(10000000) collection and try to add about 500000 different items. First 30000 items will be added in 10 seconds after running, but collection will contain just 60000 items in 1 minute after running and 150000 items in 5 minutes.

As I understand, there is non-linear dependency from memory usage in collection by adding of new item (because every item is creating during "similar equal" time period). But I can make a mistake.

Edit: You are right it is not clear enough without sample. I am trying to fill tree as connected list. You can find sample code below.

public class Matrix
{
    public int Id { get; private set; }
    public byte[,] Items { get; private set; }
    public int ParentId { get; private set; }
    public int Lvl { get; private set; }
    public int HorizontalCounts
    {
        get { return 3; }
    }

    public int VerticalCounts
    {
        get { return 3; }
    }

    public Matrix(int id) : this(id, null, 0, 1)
    {
    }

    public Matrix(int id, byte[,] items, int parentId, int lvl)
    {
        Id = id;
        Items = (items ?? (new byte[HorizontalCounts, VerticalCounts]));
        ParentId = parentId;
        Lvl = lvl;
    }

    public bool IsEmpty(int hCounter, int vCounter)
    {
        return (Items[hCounter, vCounter] == 0);
    }

    public Matrix CreateChild(int id)
    {
        return (new Matrix(id, (byte[,])Items.Clone(), Id, (Lvl + 1)));
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        Matrix node = new Matrix(1);
        const int capacity = 10000000;
        List<Matrix> tree = new List<Matrix>(capacity) { node };

        FillTree(ref tree, ref node);

        int l1 = tree.Where(n => (n.Lvl == 1)).Count();
        int l2 = tree.Where(n => (n.Lvl == 2)).Count();
        int l3 = tree.Where(n => (n.Lvl == 3)).Count();
        int l4 = tree.Where(n => (n.Lvl == 4)).Count();
        int l5 = tree.Where(n => (n.Lvl == 5)).Count();
    }

    private static void FillTree(ref List<Matrix> tree, ref Matrix node)
    {
        for (int hCounter = 0; hCounter < node.HorizontalCounts; hCounter++)
        {
            for (int vCounter = 0; vCounter < node.VerticalCounts; vCounter++)
            {
                if (!node.IsEmpty(hCounter, vCounter))
                {
                    continue;
                }

                int childId = (tree.Select(n => n.Id).Max() + 1);
                Matrix childNode = node.CreateChild(childId);
                childNode.Items[hCounter, vCounter] = 1;

                tree.Add(childNode);

                FillTree(ref tree, ref childNode);
            }
        }
    }
}

Latest Edition: I am very sorry, problem was not in amount of items into required collection. Performance problem was in this line: int childId = (tree.Select(n => n.Id).Max() + 1); Thank you very much for your answers and comments.

A: 

You want an array, if you know exactly how many beforehand. If you can allocate once, and then simply fill up, then a simple array is perfect. No wasted memory, fastest to fill, fastest to remove from.

Scott Stafford
+2  A: 

The answer to this is it depends. Are you going to be doing many inserts with no sorting? Linked List
Are you going to be doing a lot of lookups? HashMap/Dictionary
Are you going to just have an unordered group of things? List and/or Array
Do you not want duplicates? Set
Do you not want duplicates, but want a fast lookup? HashSet
Do you have an ordered List that is to be sorted by keys? TreeMap

Woot4Moo
Thank you. But I just want to fill my list so fast as it is possible :)
Maxim Polishchuk
LinkedList imo (15 chars)
Woot4Moo
@Maxim if you just want to fill a list as fast as possible, why bother doing anything at all? Presumably you want to get those items back out of the list in some way, and that will have a big impact on which data structure you use.
Jason Hall
@Jason I just wanted to create Tree\Graphs that would present as some custom linked list - but when my sample created list during a lot of time I was wondering and thought that performance problem was into many items (about 1 million), but I discovered that problem was into linq using inside recursive function. Thank you for your all feedbacks.
Maxim Polishchuk
A: 

When you're dealing with millions (or more) of items, its best to use an array. Even if you waste a few thousand slots by making your array larger than absolutely necessary, the time efficiency gained may make up for the loss of space efficiency.

Of course, if you're dealing with an amount of data that's too large to store entirely in memory, a disk-based data structure is advisable.

quanticle
"its best to use an array." I disagree. A List<T> initialized to an appropriate capacity would have similar space requirements, and is more flexible
Joe