tags:

views:

146

answers:

3

HI all,

I was trying to load a certain amount of Affymetrix CEL files, with the standard BioConductor command (R 2.8.1 on 64 bit linux, 72 GB of RAM)

abatch<-ReadAffy()

But I keep getting this message:

Error in read.affybatch(filenames = l$filenames, phenoData = l$phenoData,  : 
  allocMatrix: too many elements specified

What's the general meaning of this allocMatrix error? Is there some way to increase its maximum size?

Thank you

+1  A: 

If you're trying to work on huge affymetrix datasets, you might have better luck using packages from aroma.affymetrix.

Also, bioconductor is a (particularly) fast moving project and you'll typically be asked to upgrade to the latest version of R in order to get any continued "support" (help on the BioC mailing list). I see that Thrawn also mentions having a similar problem with R 2.10, but you still might think about upgrading anyway.

Steve Lianoglou
Yep this is exactly what aroma.affymetrix was made for, gigantic affy datasets
Aaron Statham
Most functions in affy and affyPLM packages have already optimization steps to save memory (e.g. justRMA(destructive=TRUE) ). For this, I've always found aroma quite redundant.Unfortunately, the issue remain with aroma, since the internal allocMatrix representation limit in R is not increased.
Thrawn
I agree, most affy functions do what aroma.affymetrix does for large datasets. They are also definitely faster. However, aroma.affymetrix is good when you don't have a monster machine (since it puts a cap to RAM consumption).
Tonio
+2  A: 

The problem is that all the core functions use INTs instead of LONGs for generating R objects. For example, your error message comes from array.c in /src/main

if ((double)nr * (double)nc > INT_MAX)
    error(_("too many elements specified"));

where nr and nc are integers generated before, standing for the number of rows and columns of your matrix:

nr = asInteger(snr);
nc = asInteger(snc);

So, to cut it short, everything in the source code should be changed to LONG, possibly not only in array.c but in most core functions, and that would require some rewriting. Sorry for not being more helpful, but i guess this is the only solution. Alternatively, you may wait for R 3.x next year, and hopefully they will implement this...

Tonio
I guess this is it... Time to learn C. Thanks!
Thrawn
+1  A: 

Hi, I bumped into this thread by chance. No, the aroma.* framework is not limited by the allocMatrix() limitation of ints and longs, because it does not address data using the regular address space alone - instead it subsets also via the file system. It never hold and never loads the complete data set into memory at any time. Basically the file system sets the limit, not the RAM nor the address space of you OS.

/Henrik (author of aroma.*)

HenrikB