Hello,
I am not sure there are any R users out there, but just in case:
I am a novice at R and was kindly "handed down" the following R code snippet:
Beta <- exp(as.matrix(read.table('beta.transpose')))
WordFreq <- read.table('freq-matrix')
WordProbs <- WordFreq$V1 / sum(WordFreq)
infile <- file('freq-matrix')
outfile <- file('doc_topic_prob_matrix', 'w')
open(infile)
open(outfile)
for (i in 1:93049) {
vec <- t(scan(infile, nlines=1))
topics <- (vec/WordProbs) %*% Beta
write.table(topics, outfile, append=T, row.names=F, col.names=F)
}
When I tried running this on my dataset, the system thrashed and swapped like crazy. Now I realize that has a simple reason: the file freq-matrix holds a large (22GB) matrix and I was trying to read it into memory.
I have been told to use the Matrix package, because freq-matrix has many, many zeros all over the place and it handles such cases well. Will that help? If so, any hints on how to change this code would be most welcome. I have no R experience and just started reading through the introduction PDF available on the site.
Many thanks
~l