The right way to manage a big matrix in Java

views:

214

answers:

The right way to manage a big matrix in Java

I'm working with a big matrix (not sparse), it contains about 10^10 double. Of course I cannot keep it in memory, and I need just 1 row at time.

I thought to split it in files, every file 1 row (it requires a lot of files) and just read a file every time I need a row. do you know any more efficient way?

+1 A:

Why do you want to store it in different files? Can't u use a single file?

You could use functions inside RandomAccessFile class to perform the reading from that File.

Aviator 2009-09-29 15:14:06

you are right, RandomAccessFile can be a better solution.

BigG 2009-09-29 15:26:17

thanks. :) do give it a try.

Aviator 2009-09-29 15:41:55

So, 800KB per file, sounds like a good division. Nothing really stops you from using one giant file, of course. A matrix, at least one like yours that isn't sparse, can be considered a file of fixed length records, making random access a trivial matter.

If you do store it one file per row, I might suggest making a directory tree corresponding to decimal digits, so 0/0/0/0 through 9/9/9/9.

Considerations one way or the other...

is it being backed up? Do you have high-capacity backup media or something ordinary?
does this file ever change?
if it does change and it is backed up, does it change all at once or are changes localized?

DigitalRoss 2009-09-29 15:15:33

it doesn't change and i have plenty of free space on my hard drive

BigG 2009-09-29 15:34:14

If it doesn't change, I'm guessing it doesn't need to be backed up either. I think I agree with Aviator, it's looking like one big file is the way to go.

DigitalRoss 2009-09-29 15:49:24

If you are going to be saving it in a file, I believe serializing it will save space/time over storing it as text.

Serializing the doubles will store them as 2 bytes (plus serialization overhead) and means that you will not have to convert these doubles back and forth to and from Strings when saving or loading the file.

Matt Boehm 2009-09-29 15:18:46

right i forgot to write about it in my question, sorry!

BigG 2009-09-30 14:48:02

It depends on the algorithms you want to execute, but I guess that in most cases a representation where each file contains some square or rectangular region would be better.

For example, matrix multiplication can be done recursively by breaking a matrix into submatrices.

starblue 2009-09-29 15:35:12

no i just need 1 row

BigG 2009-09-30 14:47:14

I'd suggest to use a disk-persistent cache like Ehcache. Just configure it to keep as many fragments of your matrix in memory as you like and it will take care of the serialization. All you have to do is decide on the way of fragmentation.

Another approach that comes to my mind is using Terracotta (which recently bought Ehache by the way). It's great to get a large network-attached heap that can easily manage your 10^10 double values without caring about it in code at all.

sfussenegger 2009-09-29 16:00:20

ansaurus

tags:

views:

answers:

The right way to manage a big matrix in Java

related questions