views:

519

answers:

6

I have a large array with around 20k objects in it. Each object has child objects in a big complicated tree structure with arrays in there too. Right now the app is developed using just a simple myObjectType[] myArray and it takes 13 seconds just to get the count of items in the array.

Is there a better type or is there a better way that I should be managing the array? 99% of the usage of the array is reading from it, but it currently takes almost 3 minutes to populate it.

EDIT:: add more info.

The app currently is loading all of this data into the giant array and then using the array as a data base. It then filters the data down based upon your selections from some drop down boxes and returns a subset to a datagrid to display. I don't have the option to rewrite the whole thing to just pass the filters to the actual db...

EDIT: more info, sorry for the delay, was pulled into a meeting.

[Serializable]
public class PartsList : System.Collections.CollectionBase
{
  public virtual Part[] parts {get { return (Part[])List; } }
  public new virtual int Count { get{ return this.List.Count;}}

  public virtual CountryList GetCountries()
  {
    CountryList countries = new CountryList;
    //Code removed - goes through every sub item and makes a list of unique countries... 
    // Yes, could have been done better. 
    Return countries;
  }

}

/////////////////////////////////////

[Serializable]
public class Part
{
  private int id, blah, myvariable;
  private CountryList countries;  //formatted the same way as this class...
  private YearList  years; 
  private ModelList models;
  private PriceHistoryList priceHistoryList;
  // there are a couple more like these...
}

This is why it takes 3minutes to load. - 20k parts - 1-2 countries per part - 1-10 years per part - 1-12 models per part - 1-10 price history per part

When I stop the debugger on this line: PartsList mylist = new PartsList; //populate the list here if (list.Count != 0 ) <- the debugger takes 13 seconds to get off this line after hitting f10. doing a "quick watch" on list just gives me an error for the counts value.

What I'm really looking for is, is there a better variable type to replace the array's with since they are nested internally...

UPDATE Jan 29 2010 Did some searching and it appears that due to the object designs, it is lazy loading one object at a time into memory, causing a TON of sql calls to be fired. Also the Count seems to take so long because a combo of using CollectionBase and complex objects where it is retrieving each object, counting it then going to the next. Plan now IS to move app to 2008 (.net 3.5 from 1.1) and do a rewrite of the back end of the application so that it does not pre-load 350mb into memory...

Thanks everyone for their inputs.

+6  A: 

20'000 objects (e.g. references in fact) is peanuts. The count will return instantenously. If you have trouble, it's not because of the array class.

Which collection to use in the end depends on what you want to do with it.

Before optimizing, always make sure to find the bottleneck. Often this is not what one expects first, and therefore you should absolutely use a profiler to see what is actually taking up so much time.

Lucero
The count is stored as a variable in the header of the array object, and is thus constant time (unaffected by the length of the array, so "20,000 items" has no relevance to how long it takes).
280Z28
A crude way to find that bad-ass bottleneck is simply to pause your debug session a few times and see if any patterns emerge..
Martin
A: 

You could use something like a Dictionary<T> with an appropriate key for fast lookup. However, how are you performing your Count? Using the Count() method can be slow but Length should be fast.

UPDATE Based on new info in question:

CollectionBase is known to be a bit slow as everything is stored as an object and casting operations are lodged everywhere, like ants on heat. If you're using VS 2003 with .NET 1.1 then you may be stuck with trying to optimise your way out of a bad situation. If you can use VS 2005 with .NET 2.0, then you can make use of generic collections which (a) takes away all the admin in managing a new type of collection and (b) is plenty faster.

These questions might be of interest:

Joel Goodwin
Joel, bang on with VS2003 and .net 1.1. I'm pressing for a move to 2008 though, which at the time we'll kill the entire middle tier and rewrite it so it doesn't load the entire db into memory.
dilbert789
A: 

It depends on your usage of course. I recently optimized some code and went from 2-3 minutes loading time to 1,5 seconds. Maybe you can get some ideas from my blog post about it: http://blog.zoolutions.se/post/2010/01/04/An-even-better-way-of-handling-a-singleton-WURFL-in-aspnet.aspx

This was mostly because of getting rid of my crappy singleton instance and using IoC container to do it for me.

mhenrixon
+5  A: 

The array type T[] in any .NET managed application stores the length of the array as a variable near the beginning of the object. It takes a trivial amount of time to get the count, so we'll need more information about the full structure (in particular what you mean by "the count") to tell you what's taking so long.

One recommendation is storing the "total number of child items under node X" as part of the node. This takes O(log n) time to maintain, which is the same complexity as the tree operations that affect the count, and thus does not impact the algorithmic complexity of your structure (though it does add a 4 byte variable to each node).

280Z28
+3  A: 

There's only one reason it could be so slow. It's an old-fashioned problem called thrashing. Keep an eye on your hard disk light while your program runs. Is it blinking furiously? Buy more RAM.

Hans Passant
A: 

This question is impossible for us to give you a concrete answer to because most likely there are more things happening in your code that you think is irrelevant, but might be important clues to why things are seemingly running slowly.

I say "seemingly", since we have no indication that the operations you try to perform could run faster.

What you need to do is to point a profiler towards your program and take a look and see where most of the time is used. Only then can you start looking at specific ways to speed up your program.

Note that random tinkering might give you performance gains, but if you happen to find the right point that will probably be more like blind luck than any specific skills on your part (note that I'm not saying you don't have the right skills, but even performance experts will say they are wrong when going by their gut most of the time).

There could be specific issues you need to look at though, judging by your question, but after looking at, and perhaps fixing, those issues, you still need that profiler.

  • Why does a simple "count" on an array take 3 minutes? Perhaps you left out that you're really counting objects which fit specific criteria? Or perhaps you're using .Count() extension method instead of .Length (note, .Length will return the size of the array, not how many elements you've actually put objects into)

Profilers you could look at:

Lasse V. Karlsen
The specific count code is shown above, it's not doing any criteria matching. However it IS using the .Count() extension. I'm pretty sure half the app is going to be rewritten.
dilbert789