Kernel methods for large scale dataset

views:

148

answers:

+4 Q:

Kernel methods for large scale dataset

Kernel-based classifier usually requires O(n^3) training time because of the inner-product computation between two instances. To speed up the training, inner-product values can be pre-computed and stored in a two-dimensional array. However when the no. of instances is very large, say over 100,000, there will not be sufficient memory to do so.

So any better idea for this?

The Relevance Vector Machine has a sequential training mode in which you do not need to keep the entire kernel matrix in memory. You can basically calculate a column at a time, determine if it appears relevant, and throw it away otherwise. I have not had much luck with it myself, though, and the RVM has some other issues. There is most likely a better solution in the realm of Gaussian Processes. I haven't really sat down much with those, but I have seen mention of an online algorithm for it.

kwatford 2009-06-12 04:09:02

I am not a numerical analyst, but isn't the QR decomposition which you need to do ordinary least-squares linear regression also O(n^3)?

Anyways, you'll probably want to search the literature (since this is fairly new stuff) for online learning or active learning versions of the algorithm you're using. The general idea is to either discard data far from your decision boundary or to not include them in the first place. The danger is that you might get locked into a bad local maximum and then your online/active algorithm will ignore data that would help you get out.

othercriteria 2009-06-12 05:00:49

+1 A:

For modern implementations of support vector machines, the scaling of the training algorithm is dependent on lots of factors, such as the nature of the training data and kernel that you are using. The scaling factor of O(n^3) is an analytical result and isn't particularly useful in predicting how SVM training will scale in real-world situations. For example, empirical estimates of the training algorithm used by SVMLight put the scaling against training set size to be approximately O(n^2).

I would suggest you ask this question in the kernel machines forum. I think you're more likely to get a better answer than on Stack Overflow, which is more of a general-purpose programming site.

StompChicken 2009-06-12 08:54:02

ansaurus

tags:

views:

answers:

Kernel methods for large scale dataset

related questions