ansaurus

Question

Performing a SVD on tweets. Memory problem

Answer 1

+1 A:

This is a big dense matrix. However, it is only a small a small sparse matrix.

Using a sparse matrix SVD algorithm is enough. e.g. here.

Yin Zhu 2010-05-15 01:07:40

is it big and dense or small and sparse?

el chief 2010-05-19 00:26:09

@ el. I mean you the matrix is stored in a dense, i.e. 2D array. It costs a lot of memory. However, I think the non-zero entries in the matrix can be safely stored in memory, thus a sparse SVD algorithm could apply on it.

Yin Zhu 2010-05-19 00:48:48

YEah, he could definitely exploit the sparse svd algorithms if he only needs a few singular value/vector pairs. However, I really don’t understand why this is necessary. The described matrix is not very big at all.

SplittingField 2010-05-19 23:20:50

@SF. you are right! This matrix is not big in its dense format too.

Yin Zhu 2010-05-20 00:29:19

Answer 2

A:

SVD is constrained by your memory size. See:

Folding In: a paper on partial matrix updates.

Apache Mahout is a distributed data mining library that runs on hadoop which has a parallel SVD

Steve 2010-05-15 01:15:04

As described though, this really is not a big matrix so I don’t fully understand why the poster is getting into trouble...

SplittingField 2010-05-19 23:21:07

I have checked up on my matrix again: the dimensions are 650 * 150.000, so I forgot a 0 :). The implementation I am using is a ruby wrapper around lapack, so maybe somewhere on the way I am getting that error. I also tried working on it with rapidminer which allows me a SVD on a matrix but it seems to have memory errors too. I was wondering in terms of wordcount simply dropping all the words that occour only once. THat would greatly reduce the dimension of the matrix. Anyway thank you for your help, I will have a look into SVDLIBC.

plotti 2010-05-21 09:44:38

@plotti 650 by 150,000 is still not very big. As a single array of doubles, this requires around 650*150,000*8*(1/1024)*(1/1024) = 744 MB. This should still fit into memory (it does on my laptop).LAPACK can easily handle matrices of this size directly, however I am not certain how the Ruby wrapper works.If you provide some more information above, I can better help determine which algorithms you should be looking at.

SplittingField 2010-05-26 02:03:09

ansaurus

tags:

views:

answers:

Performing a SVD on tweets. Memory problem

related questions