views:

525

answers:

5

I have data structure, specifically a queue, that is growing so large as to cause an out of memory exception. This was unexpected behavior due to the relativity simple object it is holding (having a essentially one string field).

Is there an easy way, or a built in .NET way, to save this collection to a file on disk (there is no database to work with), and it continue to function transparently as queue?

+1  A: 

Maybe a queue is not an appropriate data structure for your problem. How many objects are in your queue? Do you really need to store all those strings for later? Could you replace the strings with something smaller like enums or something more object-oriented like the Flyweight design pattern?

If you are processing lots of data, sometimes it's faster to recompute or reload the original data than saving a copy for later. Or you can process the data as you load it and avoid saving it for later processing.

cpeterso
+1  A: 

I would first investigate why you are getting the OOM.

If you are adding to the queue - keep a check on the size and perform some action when a threshold is breached.

Can you filter those items? Do the items have many duplicates? In which case you could replace duplicates with a pre-cached object.

Fortyrunner
+1  A: 

I would use Sqlite to save the data to disk.

tuinstoel
A: 

In response to your comment on the question, I guess you could split your file-collecting thread into two threads:

  • The first thread merely counts the number of files to be processed, and increment a volatile int count. This thread only updates the count; it does not store anything in the queue.
  • The second thread is very similar to the first one, except that it doesn't update the count, and instead it actually saves the data into the queue. When the size of the queue reaches a certain threshold, your thread should block for some time, and then resume adding data to the queue. This ensures that your queue is never larger than a certain threshold.

I would even guess that you wouldn't actually need the second thread. Since the first one would give you the count you need, you can find the actual files in your main thread, one file at a time. This way you'll save yourself from having the queue, which will reduce your memory requirements.

However, I doubt that your queue is the reason you're getting out of memory exceptions. Even if you're adding one million entries to the queue, this would only take about 512 MB memory. I suggest you check your processing logic.

Hosam Aly
A: 

The specific answer is no, there is not an easy or built in way to do this. You have to write it to disk "yourself".

Figure out why you are getting the out of memory it might surprise you. Maybe its string internment, maybe your are fragmenting the GC with all the small object allocations. The CLR Profiler from Microsoft is fantastic for this.

Mo Flanagan