views:

120

answers:

4

I am working on an application which has potential for a large memory load (>5gb) but has a requirement to run on 32bit and .NET 2 based desktops due to the customer deployment environment. My solution so far has been to use an app-wide data store for these large volume objects, when an object is assigned to the store, the store checks for the total memory usage by the app and if it is getting close to the limit it will start serialising some of the older objects in the store to the user's temp folder, retrieving them back into memory as and when they are needed. This is proving to be decidedly unreliable, as if other objects within the app start using memory, the store has no prompt to clear up and make space. I did look at using weak pointers to hold the in-memory data objects, with them being serialised to disk when they were released, however the objects seemed to be getting released almost immediately, especially in debug, causing a massive performance hit as the app was serialising everything.

Are there any useful patterns/paradigms I should be using to handle this? I have googled extensively but as yet haven't found anything useful.

+4  A: 

I thought virtual memory was supposed to have you covered in this situation?

Anyways, it seems suspect that you really need all 5gb of data in memory at any given moment - you can't possibly be processing all that data at any given time - at least not on what sounds like a consumer PC. You didn't go into detail about your data, but something to me smells like the object itself is poorly designed in the sense that you need the entire set to be in memory to work with it. Have you thought about trying to fragment out your data into more sensible units - and then do some preemptive loading of the data from disk, just before it needs to be processed? You'd essentially be paying a more constant performance trade-off this way, but you'd reduce your current thrashing issue.

Mike Atlas
+1: Agreed. It sounds like you're trying to implement your own paging scheme, which I bet Microsoft have already solved.
Oli Charlesworth
32bit allows only 2GB (or 3GB in a special Windows configuration) of user addressable memory space. So paging doesn't really solve the question asked.
Lucero
@Lucero: That would explain why his app immediately starts "caching". I think my original suggestion of trying to re-visit the structure of the data is really the ultimate answer to the solution. I'll try to clarify this.
Mike Atlas
@Mike, I agree with you. I was just pointing out that paging/virtual memory is not going to solve the issue, because there's something else which needs to be done in order to reduce the amount of actual memory use.
Lucero
All - the initial data file is typically around 1gb, with the resulting objects which are being processed averaging in the order of Mb - I have added some more info further up which I will avoid repeating. I apologise that I am constrained on how much I can say, but think of it as needing to converge on a solution through an iterative process, however with the potential to need to retreat to a previous step and take a different route. Thank you all for your help!
Matt Reeve
A: 

When you have to store huge loads of data and mantain accessibility, sometimes the most useful solution is to use data store and management system like database. Database (MySQL for example) can store a lots of typical data types and of course binary data too. Maybe you can store your object to database (directly or by programming business object model) and get it when you need to. This solution sometimes can solve many problems with data managing (moving, backup, searching, updating...), and storage (data layer) - and it's location independent - mayby this point of view can help you.

UGEEN
A: 

would something like memcached or coherence do what you want?

Joe
Hi Joe - I'd looked at memcached and coherence, however they seem to be aimed at caching data loaded from disk, as opposed to only writing to disk if memory becomes pressed. Thank you though!
Matt Reeve
+2  A: 

Maybe you go with Managing Memory-Mapped Files and look here. In .NET 2.0 you have to use PInvoke to that functions. Since .NET 4.0 you have efficient built-in functionality with MemoryMappedFile.

Also take a look at: http://msdn.microsoft.com/en-us/library/dd997372.aspx

You can't store 5GB data in-memory efficiently. You have 2 GB limit per process in 32-bit OS and 4 GB limit per 32-bit process in 64-bit Windows-on-Windows

So you have choice:

  • Go in Google's Chrome way (and FireFox 4) and maintain potions of data between processes. It may be applicable if your application started under 64-bit OS and you have some reasons to keep your app 32-bit. But this is not so easy way. If you don't have 64-bit OS I wonder where you get >5GB RAM?

  • If you have 32-bit OS when any solution will be file-based. When you try to keep data in memory (thru I wonder how you address them in memory under 32-bit and 2 GB per process limit) OS just continuously swap portions of data (memory pages) to disk and restores them again and again when you access it. You incur great performance penalty and you already noticed it (I guessed from description of your problem). The main problem OS can't predict when you need one data and when you want another. So it just trying to do best by reading and writing memory pages on/from disk.

    So you already use disk storage indirecltly in inefficient way, MMFs just give you same solution in efficient and controlled manner.

You can rearchitecture your application to use MMFs and OS will help you in efficient caching. Do the quick test by yourself MMF maybe good enough for your needs.

Anyway I don't see any other solution to work with dataset greater than available RAM other than file-based. And usually better to have direct control on data manipulation especially when such amount of data came and needs to be processed.

Nick Martyshchenko
Hi Sentinel - thanks for the suggestion, but that would rely on me writing the data out to disk to be able to memory map the file, I am trying to avoid the CPU/IO hit of serializing this much data to disk if it's not needed.
Matt Reeve
+1, this is the correct answer. There isn't any other place you can shove 5 gigabytes, MMFs make the disk I/O transparent. Albeit that they are quite incompatible with garbage collection. A 64-bit operating system is the 200 dollar solution.
Hans Passant
@Hans Passant, thank you. I agree with you about 64-bit OS suggestion thru it maybe not an option for Matt since it requires to upgrade all its customers but he ask about .NET 2.0 so Im not sure he can upgrade.@Matt Reeve: I'll comment in my answer
Nick Martyshchenko
@Hans - I would absolutely love my customers to be able to move to 64-bit. Indeed, we have prototyped this and the problem did go away, however we have a very constrained deployment environment which restricts us to x86 and .NET 2.0.
Matt Reeve
@Matt - it is simple economics. I'd guess this is going to take you at least a month to get it to not completely suck. Your boss counts your hours at, say, $150 per. That's 24K. You can buy a 64-bit box from Dell for about $600. Ship your code now, include 40 machines to break even. This is the kind of reasoning that appeals to the bean counters who put up these kind of artificial road blocks. YMMV.
Hans Passant
@Hans - While I would whole-heartedly agree with you if this was in a pure commercial environment, the deployment environment with my customer is far more constrained, for example the equipment would need to be fully ruggedised. The costs of the kit and the deployment would far far exceed the £24k cost.
Matt Reeve
@Matt - do the bean counters carry guns? Well, worth a shot, good luck with it.
Hans Passant
I had a hunch the OP's customer was gov/mil related!
Mike Atlas
I couldn't possibly comment about the hardware carried by my customer......
Matt Reeve