tags:

views:

459

answers:

7

I have an algorithm which currently allocates a very large array of doubles, which it updates and searches frequently. The size of the array is N^2/2, where N is the number of rows on which the algorithm is operating. I also have to keep a copy of the entire thing for purposes associated with the application surrounding the algorithm.

Of course this imposes a limit on the number of rows that my algorithm can handle as I have the heap limitation to contend with. Up to this point I have got away with asking the people using the algorithm to update the -Xmx setting to allocate more space, and that has worked fine. However, I now have a genuine problem where I need this array to be larger than I can fit into memory.

I already have plans to change my algorithm to mitigate the necessity of this large array and have some promising results in that domain. However it is a fundamental alteration to the process and will require a lot more work before it gets to the highly polished condition of my current code which is operating in production very successfully and has been for several years.

So, while I am perfecting my new algorithm I wanted to extend the life of the existing one and that means tackling the heap limitation associated with allocating my huge array of doubles.

My question is what is the best way of dealing with it? Should I use an nio FileChannel and a MappedByteBuffer, or is there a better approach. If I do use the nio approach, what sort of performance hit should I expect to take compared to an in-memory array of the same size?

Thanks

+5  A: 

If you are starting to run out of available memory, then you will probably also soon start to run out of available array indexes, an array is bounded in size to Integer.MAX_VALUE, and that when using doubles as the array elements is "only" 32GB in size.

Getting a machine with 32GB of memory is expensive, but probably not as expensive as your time to modify the algorithm, and all of the associated testing.

However, if the client is running to the edges of memory, and their datasets are still growing, then it makes sense for you to bite the bullet now, and make the changes to be able to use less memory at any given time, since they will likely soon outgrow an array anyway.

The other option that you have, assuming that the array is somewhat sparsely filled, is to use one of the various sparse array data structures, although these tend to only be beneficial if your array is less than 20% full.

Edit: Since it seems that you have already investigated the alternatives, then the MappedByteBuffer may well be the way to go. Obviously this is going to have a performance impact, however if you do mostly sequential reads and writes from the array, then this should not be too bad. If you are doing random reads and writes, then this is going to get very slow very fast. Or very slow very slowly... depending on how you look at these things ;-)

Paul Wagland
+1 for highlighting the hardware vs development costs
Brian Agnew
It's probably true that it would be cheaper to get a bigger box, however this is not a very palatable answer for the customer. The conversation so far has gone Customer:"it's crashed again" Me:"increase the memory allocated to the JVM". If my next answer is "buy a new computer with 32Gb of RAM" they will push back. In fact, the code is well enough abstracted that I think I can inject a different array class at runtime. So my risk is limited to the quality of my implementation of an nio based array.
Simon
seems like the answer to the more memory pushback is "wait until the new algorithm is done". Is your time better spent working on the new algorithm or nursing the old one along? Do you have any performance metrics for the new algorithm you can present to persuade the customer?
basszero
@basszero commercial situation is such that the answer is I have to do both. My time is best spent making them successful with their customers, which a relatively small amount of investment in extending the existing algorithm is likely to do. I reckon that if I can get it about twice the capacity it is now I can buy myself a year to really nail the "proper" solution and test it adequately that I can plug it into their existing framework. They are very understanding and we agree that the long term solution is the right one, it is just a question fo helping them in the here-and-now.
Simon
Again and again we're coming back to how can the data set be partitioned... I'm guessing that's tricky for this app. Arbitrary partitioning seems to be in order. In the worst case if the random-access-ness cannot be removed you have a slow system. I;m guessing but maybe the algorithm can be run in several phases, on data that is smaller than a double for some of the phases (classification of some double values using eg an 8bit code), and the full 64bits referenced when needed.
martinr
+1 for the edit, thanks
Simon
@martinr you're right, of course. Even at the extent I have right now (25k rows) I have an addressing problem as Paul alludes to, so I have to partition the array in any case. However, this is a less obtrusive solution to the one of a complete re-write of the algorithm, which is underway as I have said.
Simon
I think you need a poster with "Simon and the Algorithm" (where it would say "Jason and the Argonauts"!) ^:)
martinr
As for the debate about a bigger machine, if you start swapping out page of your array to disk, your algorithms run time is going to drop like a rock. Using the extreme case, if you have a 32GB array, and instead of keeping that in RAM, you're paging it in from disk, then simply each time your algorithm runs, it must load 32GB from disk. Even more horrifying, it may WRITE 32GB to disk. Add to the the possible heart stopping fact you may do this more than once in a run. So, in truth the paging option may well simply be unnacceptable due to performance alone. Just a thought. Twenty chars remain.
Will Hartung
A: 

You are moving in the realm of how to write software that utilizes a cache (as in memory cache in the cpu) best. This is hard to do right, and the "right" way to do it depends on how your algorithm is designed.

So, what does your program actually do algorithmically?

Thorbjørn Ravn Andersen
If I answer that do you promise not to tell me to implement my algorithm in a different way? The problem I think I'll have by answering your question is that I have an inefficent algorithm to start with. I know that and I am well into fixing it. What I am looking for is a way of extending its life to allow me time to complete that re-implementation. Sorry to be a bit short, but what I need right now is an answer to my question about the pros and cons of nio for my immediate problem.
Simon
@Simon, We can't tell you about that unless we know the characteristics of the algorithm. For instance, if it does lots of binary searches over the array for disparate elements than you are probably in trouble. If however, if spends a lot of time in a smaller area and then moves to another area, then your caching and disk IO might successful.
tster
@Simon, I'm not interested in your choice of algorithm, but I am interested in the behaviour of the one you've chosen. Often the choice of which order to iterate the dimensions of an multidimensional array is important, e.g.
Thorbjørn Ravn Andersen
+2  A: 

If you're running on PCs, page sizes for mapped files are likely to be 4 kilobytes.

So the question really starts from if I start swapping the data out to disk, "how random is my random access to the RAM-that-is-now-a-file"?

And (...can I and if so...) how can I order the doubles to maximise cases where doubles within a 4K page are accessed together rather than a few at a time in each page before the next 4K disk fetch?

If you use standard IO, you probably still want to read and write in chunks but ther chunks could be smaller. Sectors will be at least 512 bytes, disk clusters bigger, but what size of read is best given that there is a kernel round trip overhead for each IO?

I'm sorry but I'm afraid your best next steps depend to a great extent on the algorithm and the data you are using.

martinr
+1 This is a very good steer, thank you. I realise that there is a lot more to my question and it will depend on how my algorithm works, but this gives me some parameters within which to make my judgement, which is what I was hoping for.
Simon
I concede that considering L2 cache size issues in design is also important.
martinr
Quoting ChrisW: "I'm surprised: Figure 3 in the middle of http://queue.acm.org/detail.cfm?id=1563874 says that memory is only about 6 times faster when you're doing sequential access (350 Mvalues/sec for memory compared with 58 Mvalues/sec for disk); but it's about 100,000 times faster when you're doing random access."http://stackoverflow.com/questions/1371400/how-much-faster-is-the-memory-usually-than-the-disk
martinr
A: 

You can try storing the array as rows in a database table and use stored procs to do updates and searches on it.

Another Idea:

Use a B-Tree as your array and keep some leaves on disk. Make sure and make the nodes of the B-Tree the size of a page or the size of multiple pages.

tster
I have considered this and have a lot of experience with databases. In this instance the thing that stops me is that it greatly explodes the complexity of the solution I put in front of my customer. At some point I may move towards using a database for some persistence, but right now I judge that it is a bit too much overhead for my current issue, which is just about extending the lifespan of the existing algorithm.
Simon
A: 

If the problem is that you are running out of memory, the simple solution is to upgrade your hardware with more memory, increase the Java heap size and/or switch to a 64-bi5t JVM.

On the other hand, if you are running against the Java limit on the size of arrays, you could go down the ByteBuffer route, or you could switch to using an array of arrays. The later is Sun's suggested workaround.

With the array of arrays approach you could (in theory) cope with values of N close to 2**31. In practice your limit will be determined by the amount of physical memory you have, and the amount that can be addressed using your combination of OS / JVM.

Stephen C
+1  A: 

I've had generally good experiences with Java's MappedByteBuffers, and encourage you to have a deeper look at it. It very well may allow you to not deal with the -Xmx changes again. Be aware that if you need more than 2-4GB of addressable space then a 64-bit CPU, OS and JVM are required.

To get beyond the Integer.MAX_VALUE indices issue you could write a paging algorithm, as I have done here in a related answer to Binary search in a sorted (memory-mapped ?) file in Java.

Stu Thompson
Ha! I had just finished writing a paging List of MappedByteBuffers, it's nice to see someone else's solution. In my case I don't have a pre-existing file to map to, so I am going to have a separate page file for each block of Integer.MAX_INT doubles. This has a further advantage for me in that I can spawn a thread to do quite a lot of the searching and thereby take advantage of the other processors I know are knocking around and unused. I have a sneaking suspicion I may be able to extend the size of my algorithm and make it faster at the same time. We'll see. Thanks for the post +1
Simon
Great to hear to hear that :) I've always been very pleased with that answer. *As for your performance ideas,* be sure to check if the CPU is the bottle neck and not the disk. If the later, multiple processes/threads accessing different parts of the disk at once very well may impede your overall performance. Even if it is the former, a measured approach to adding threads is a good idea. (Lesson learned the hard way for me ;)
Stu Thompson
A: 

Be aware that some operating systems have better support for memory mapping than others.

I would be tempted to do this:

  1. Put all your array gets/puts behind an object interface (if they aren't already) thus freeing you up to easily change the implementation.
  2. Use an array of SoftReferences where each SoftReference points to the array of doubles for that row. Use a ReferenceQueue to save the arrays to disk when the GC kicks them out. When get() returns null, retrieve from disk.

You might find you have more control over performance that way - the -Xmx can be tweaked as desired.

Dave Griffiths