First of all, some background to kernels and SVMs...
If you want to pre-compute a kernel for n
vectors (of any dimension), what need to do is calculate the kernel function between each pair of examples. The kernel function takes two vectors and gives a scalar, so you can think of a precomputed kernel as a nxn
matrix of scalars. It's usually called the kernel matrix, or sometimes the Gram matrix.
There are many different kernels, the simplest is the linear kernel (also known as the dot product):
sum(x_i * y_i) for i in [1..N] where (x_1,...,x_N) (y_1,..,y_N) are vectors
Secondly, trying to answer your problem...
The documentation about precomputed kernels in libsvm is actually pretty good...
Assume the original training data has three four-feature instances
and testing data has one instance:
15 1:1 2:1 3:1 4:1
45 2:3 4:3
25 3:1
15 1:1 3:1
If the linear kernel is used, we have the following
new training/testing sets:
15 0:1 1:4 2:6 3:1
45 0:2 1:6 2:18 3:0
25 0:3 1:1 2:0 3:1
15 0:? 1:2 2:0 3:1
Each vector here in the second example is a row in the kernel matrix. The value at index zero is the ID value and it just seems to be a sequential count. The value at index 1 of the first vector is the value of the kernel function of the first vector from the first example with itself (i.e. (1x1)+(1x1)+(1x1)+(1x1) = 4
), the second is the value of the kernel function of the first vector with the second (i.e. (1x3)+(1x3)=6
). It follows on like that for the rest of the example. You can see in that the kernel matrix is symmetric, as it should be, because K(x,y) = K(y,x).
It's worth pointing out that the first set of vectors are represented in a sparse format (i.e. missing values are zero), but the kernel matrix isn't and shouldn't be sparse. I don't know why that is, it just seems to be a libsvm thing.