views:

58

answers:

4

I am somewhat new to C# programming and need some advice on how to tackle a problem. I need to process tens of thousands records stored on a SQL Server database, and the processing should be as fast as possible.

To maximize the performance, I am fetching the rows from the database on a background thread when the application starts, because it need to wait for some user input before starting the processing. This approach saves 20% of time of the entire process, but it is very resource hungry in terms of memory footprint (the process is taking 200MB of RAM, and I estimate that the database rows has less than 10MB of raw data).

I'm using a class with members storing data from the database columns, and using ArrayList to store the rows.

Is there another approach to store the data in memory to minimize the consumed memory?

+1  A: 

You should be aware of the fact, that the memory usage indicated in the taskmanager is not necessary the memory used by the data. The program grabs more memory than it needs at the moment, to be able to scale well. If you want to find out how much memory is exactly used, use a memory profiler.

Femaref
.NET doesn't grab THAT much more than it needs.
Qwertie
Can you indicate a good memory profiler, please?
LrycXC
Femaref
+1  A: 

Some basic things to check without knowing more details about your app:

  • Are you only storing stuff in memory that you need?
  • Are you creating stuff on the Large Object Heap? That likely won't be collected.
  • Can you process data in batches, and reduce the results of each batch into another intermediate memory/disk store? Essentially, can you use some form of map-reduce?
  • Use WinDBG to look at your heap and see rooted objects. It'll give you a better idea of what's in the 200MB.
psychotik
I'll take a look at WinDBG, thanks!
LrycXC
+1  A: 

What are the data types of the columns?

If there are a lot of strings then you may be suffering from the string overhead. .NET strings are UTF-16 (2 bytes per character) and (I think) have 16-18 bytes of overhead per string. If you really need to save memory, and the data is ASCII, you could consider combining several string columns into a byte array using Encoding.UTF8.

// Occupies 64 bytes of memory
string col1 = "Me", col2 = "You", col3 = "Us";

StringBuilder sb = new StringBuilder(col1);
// only works if you are sure the columns have no nulls
sb.Append('\0');
sb.Append(col2);
sb.Append('\0');
sb.Append(col3);

// Occupies 24 bytes of memory
byte[] array = Encoding.UTF8.GetBytes(sb.ToString());

Of course, this would slow down the program and you'd have to write code to unpack the byte array when you need to get the strings out, but you might save a lot of memory.

Qwertie
There is one CHAR(6) column, three INTs and an IMAGE column storing no more than 900 bytes.
LrycXC
I guess I didn't help then. You could use CLR profiler to see where the memory is going: http://www.microsoft.com/downloads/details.aspx?FamilyId=A362781C-3870-43BE-8926-862B40AA0CD0 if using a .NET Image or Bitmap to hold the data, be sure it doesn't have greater bit depth than the original image.
Qwertie
A: 

"I'm using a class with members" could be Your problem. The primitive data types like bool, int etc should roughly require the same space as they do in Your DB. But when You create a new instance of a class, additional data has to be reserved on the heap. Now this shouldn't account for 200MB when only processing "tens of thousands" of rows, but You could try to use a value type instead (eg change Your class to a struct).

Also, if Your DB contains strings of roughly the same length each, You could use char-arrays to store them in order to "minimize the consumed memory".

Dave
I tried to use a struct instead of a class to store the values, but it made no noticeable difference in terms of process size.
LrycXC
Well, maybe the core of Your application has a really small memory footprint and most of the 200mb account for the runtume environment as other answers suggest. The idea behind using structs was to to store all of them in a continuous block of memory, thus avoiding the additional data per object instantiation.
Dave