tags:

views:

108

answers:

3

I have a C# program that uses a production grammar to generate 3D models of trees and flowers and similar organic entities (see wikipedia entry for more info on L-Systems) - when I'm generating a large tree with leaves, I (expectedly) get exponential growth in the string that would go up to 100's of gigs if I'd let it (and I'd like to).

Constraints - I have to do this (sort of) in C# - the C++/native side is busy compiling and rendering the rather immense geometry that's produced.

So StringBuilder is right out --- even if it could handle it, I don't have enough memory!

I don't want to do a pure file based solution - waaaaaayyyyyyyy toooooooooooo sloooooooooooowwww!

I can't change the grammar - I realize I could compress the standard L-Systems notation, but it's a context sensitive grammar, so once you've got it working, you become positively superstitious about fiddling with it.

Things I've considered

Memory mapped files - I don't mind using P/Invoke to get to the native layer to support things, I just don't want to rewrite the whole production system in C++ - but I haven't found much in the way of handy libraries for C# to access this functionality

Low level mucking about with memory management/page faulting, etc - but hey, if I did that I might as well sell it as a product - makes the slow pure file based solution not look like such a bad idea

Anybody got any ideas here ? How do I effeciently traverse/manipulate/expand multigig strings produced by a production grammar ?

+4  A: 

If you can upgrade to .net 4.0 then then you can use memory mapped files without needing to P/Invoke.

http://msdn.microsoft.com/en-us/library/dd997372.aspx

Daniel James Bryars
Thanks! - am using .net 4, didn't know that
Mark Mullin
If you're dealing with amounts of data in the 100G range, then -- like it or not -- you're dealing with memory mapping and paging. Since you're not willing to off-load the mess to a database, you might as well tackle this head-on at this level.
Steven Sudit
A: 

If this is only for your development machines then a "back to the future" solution might be a RAM Disk, aka RAM Drive.

A RAM disk or RAM drive is a block of RAM (primary storage or volatile memory) that a computer's software is treating as if the memory were a disk drive (secondary storage).

One product for example. Search for RAM Disk or RAM drive and you'll get a cornucopia of choices.

JustBoo
100 GB won't fit in a RAM drive.
Albin Sunnanbo
Then have you come full circle? Paging Systems, whether liked or not, may be in your future. :-)
JustBoo
+1  A: 

You're quite right that the typical approach to compression involves the notion of a pre-existing plaintext. What I'm talking about here is something like the idea of using a trie data structure as opposed to a dictionary. It's not just about passively compressing, but rather using an inherently more compact representation that encodes the redundancies implicitly. If you're at the 100G mark today, you're within an order of magnitude of bursting past the limits of affordable hard drives, so you might benefit from rethinking the solution.

Steven Sudit
Please note that Daniel's answer about memory mapped files is complementary to mine.
Steven Sudit