views:

111

answers:

1

Hi all,

I'm trying to normalize a big amount of Affymetrix CEL files using R. However, some of them appear to be truncated, so when reading them i get the error

Cel file xxx does not seem to have the correct dimensions

And the normalization stops. Manually removing the corrupted files and restart every time will take very long. Do you know if there is a fast way (in R or with a tool) to detect corrupted files?

PS I'm 99.99% sure I'm normalizing together CELs from the same platform, it's really just truncated files :-)

+3  A: 

One simple suggestion:

Can you just use a tryCatch block around your read.table (or whichever read command you're using)? Then just skip a file if you get that error message. You can also compile a list of corrupted files within the catch block (I recommend doing that so that you are tracking corrupted files for future reference when running a big batch process like this). Here's the pseudo code:

corrupted.files <- data.frame()
for(i in 1:nrow(files)) {
    x <- tryCatch(read.table(file=files[i]), error = function(e) 
         if(e=="something") { corrupted.files <- rbind(corrupted.files, files[i]) } 
         else { stop(e) }, 
       finally=print(paste("finished with", files[i], "at", Sys.time())))
    if(nrow(x)) # do something with the uncorrupted data            
}
Shane
Not bad thanks :-) It works for removing corrupted files. (To read them, I use a specific ReadAffy function from BioConductor, but that's ok).I still need something to check the name of the platform, but that is something for a bioconductor forum maybe.
Thrawn