views:

1420

answers:

5

Hello!

I would like to implement Singular Value Decomposition (SVD) in PHP. I know that there are several external libraries which could do this for me. But I have two questions concerning PHP, though: 1) Do you think it's possible and/or reasonable to code the SVD in PHP? 2) If (1) is yes: Can you help me to code it in PHP?

I've already coded some parts of SVD by myself. Here's the code which I made comments to the course of action in. Some parts of this code aren't completely correct.

It would be great if you could help me. Thank you very much in advance!

+2  A: 

Regarding question 1: It definitely is possible. Whether it's reasonable depends on your scenario: How big are your matrices? How often do you intend to run the code? Is it run in a web site or from the command line? If you do care about speed, I would suggest writing a simple extension that wraps calls to the GNU Scientific Library.

rodion
Thank you for this answer. I would like to run it on a website and the script should be called by a cronjob. I don't care much about speed. It would be enough if the script would always do the SVD for 1 single text. The huge matrix which I always need could also be cached.Writing an extension for GNU Scientific Library is a problem because I can't install libraries on my webspace via the command line.
If you have shell access and can install binaries and use cron, you should probably think of writing standalone (probably, statically-linked) binary, not a PHP script. Even for the same algorithm this will be more CPU and memory-efficient.
drdaeman
A: 

1) Yes. this is perfectly possible to be implemented in PHP. I don't know what the reasonable time frame for execution and how large it can compute. I would probably have to implement the algorithm to get a rought idea.

2) Yes I can help you code it. Buy why do you need help? Doesn't the code you wrote work?

Just as an aside question. What version of PHP do you use?

kyungeui
Thank you very much! Lots of people who I've spoken with told me that PHP is completely unsuitable for SVD. I don't care what the time limits are, I would just like to implement it.My code doesn't work since it doesn't get the eigenvalues. I've tried out some approximation procedures but they didn't work well. It would be great if you could help me. I use PHP 5.
+4  A: 

Be careful when you say "I don't care what the time limits are". SVD is an O(N^3) operation (or O(MN^2) if it's a rectangular m*n matrix) which means that you could very easily be in a situation where your problem can take a very long time. If the 100*100 case takes one minute, the 1000*1000 case would 10^3 minutes, or nearly 17 hours (and probably worse, realistically, as you're likely to be out of cache). With something like PHP, the prefactor -- the number multiplying the N^3 in order to calculate the required FLOP count, could be very, very large.

Having said that, of course it's possible to code it in PHP -- the language has the required data structures and operations.

Andrew Jaffe
Thank you very much for this! So you think that it's possible to code it with PHP but PHP is not very suitable, right?
Well, PHP isn't ideal for numerical linear algebra, but whether you can make it work in your case depends on the details. How big are the matrices you will be running it on? What, exactly, do you need to do? You may want to consult a book like Numerical Recipes <http://www.nr.com/> for information on implementations.
Andrew Jaffe
I would like to use SVD for Latent Semantic Analysis. So 100x100 won't be enough for the matrices, they will be huge ...
Marco, I've done LSA, and I can tell you assuredly that PHP will provide unacceptable response times for matrices of *any* size, let alone huge. Save yourself the time trying to figure it out and find a way to interface with C. Svdlibc worked great for me: http://tedlab.mit.edu/~dr/svdlibc/
llimllib
Thank you, llimllib. As soon as I have C on my webserver I will use it. But at the moment I don't have it and I can't install anything using the console.
+1  A: 

Yes it's posible, but implementing SVD in php ins't the optimal approach. As you can see here PHP is slower than C and also slower than C++, so maybe it was better if you could do it in one of this languages and call them as a function to get your results. You can find an implementation of the algorithm here, so you can guide yourself trough it.

About the function calling can use:

  • The exec() Function

The system function is quite useful and powerful, but one of the biggest problems with it is that all resulting text from the program goes directly to the output stream. There will be situations where you might like to format the resulting text and display it in some different way, or not display it at all.

  • The system() Function

The system function in PHP takes a string argument with the command to execute as well as any arguments you wish passed to that command. This function executes the specified command, and dumps any resulting text to the output stream (either the HTTP output in a web server situation, or the console if you are running PHP as a command line tool). The return of this function is the last line of output from the program, if it emits text output.

  • The passthru() Function

One fascinating function that PHP provides similar to those we have seen so far is the passthru function. This function, like the others, executes the program you tell it to. However, it then proceeds to immediately send the raw output from this program to the output stream with which PHP is currently working (i.e. either HTTP in a web server scenario, or the shell in a command line version of PHP).

Luis
Thank you very much! :)
+5  A: 

SVD-python Is a very clear, parsimonious implementation of the SVD. It's practically psuedocode and should be fairly easy to understand and compare/draw on for your php implementation, even if you don't know much python.

SVD-python

That said, as others have mentioned I wouldn't expect to be able to do very heavy-duty LSA with php implementation what sounds like a pretty limited web-host.

Cheers

Edit: The module above doesn't do anything all by itself, but there is an example included in the opening comments. Assuming you downloaded the python module, and it was accessible (e.g. in the same folder), you could implement a trivial example as follow,

#!/usr/bin/python
import svd
import math

a = [[22.,10., 2.,  3., 7.],
     [14., 7.,10.,  0., 8.],
     [-1.,13.,-1.,-11., 3.],
     [-3.,-2.,13., -2., 4.],
     [ 9., 8., 1., -2., 4.],
     [ 9., 1.,-7.,  5.,-1.],
     [ 2.,-6., 6.,  5., 1.],
     [ 4., 5., 0., -2., 2.]]

u,w,vt = svd.svd(a)
print w

Here 'w' contains your list of singular values.
Of course this only gets you part of the way to latent semantic analysis and its relatives. You usually want to reduce the number of singular values, then employ some appropriate distance metric to measure the similarity between your documents, or words, or documents and words, etc. The cosine of the angle between your resultant vectors is pretty popular.

Latent Semantic Mapping (pdf)

is by far the clearest, most concise and informative paper I've read on the remaining steps you need to work out following the SVD.

Edit2: also note that if you're working with very large term-document matrices (I'm assuming this is what you are doing) it is almost certainly going to be far more efficient to perform the decomposition in an offline mode, and then perform only the comparisons in a live fashion in response to requests. while svd-python is great for learning, the svdlibc is more what you would want for such heavy computation.

finally as mentioned in the bellegarda paper above, remember that you don't have to recompute the svd every single time you get a new document or request. depending on what you are trying to do you could probably get away with performing the svd once every week or so, in an offline mode, a local machine, and then uploading the results (size/bandwidth concerns notwithstanding).

anyway good luck!

blackkettle
Thank you very very much!!! :) It would be great if this would work in combination with PHP's passthru() function (thx ljyanes). But this script doesn't give any output. What must I do? I've commented out this: http://paste.bradleygill.com/index.php?paste_id=10389
I've added some further info, including a working example from the comments in the python module.
blackkettle
Thanks again for the edit. Your code is exactly what I did, isn't it? ;) Look at the codepaste site where I wrote the code. I think I just took the wrong package. I only downloaded the svd.py file which you linked above. Do I have to load something else, too?
yes that looks fine. note however, that python is sensitive to whitespace/indenting. when i downloaded your example i needed to remove the leading whitespace from all the lines, and add the, #!/usr/bin/python line to the top. but assuming you have the svd.py file in your execution directory or your python path (for debian this defaults to /usr/lib/python2.5/site-packages/ )
blackkettle
it should work fine. cheers.
blackkettle
I've just seen: It wasn't your fault, it was my fault! :) I had to CHMOD the file to 755 and to write the following code at the beginning of the script, now it works fine: print "Content-type: text/html\n\n";