ansaurus

Question

Answer 1

+1 A:

Hi,

what you can do is to use a self-organizing map (SOM) with the topology of fixed length, say N-bit, words, so that e.g. if N=8 then every cell in the SOM has exactly 8 neighbors (those where one bit has been flipped). Now if you have K [dictionary] words you can encode every [dictionary] word as a vector of real numbers between 0..1 so that the ith word has the ith element set to 1 and others to 0. You can then calculate the "distance" between two arbitrary vectors a1...aK and b1...bK by summing over

 i,j : ai * bj * distance(ai, bj)

which gives you the distance metric for running the SOM algorithm. When the SOM has stabilized, [dictionary] words near to each other in your metric are near to each other in the topology of the map, from which you get the encoding trivially as [binary] words.

Note that the map must have more cells than there are words, i.e. 2**N > K.

This answer of course assumes background with self organizing maps. See http://en.wikipedia.org/wiki/Self-organizing_map

antti.huima 2010-02-19 20:31:23

>>. Now if you have K [dictionary] words you can encode every [dictionary] word as a vector of real numbers between 0..1 so that the ith word has the ith element set to 1 and others to 0. Does it mean I'll have number of bits==number of words? Wouldn't it be better to encode like general binary coding, having number of bits=log(N), N-number of dict words?Or do you mean to encode it in matrix (N*N), where wij is weight(ai,bj). In such case, having every row as an example for SOM, number of examples==number of variables.Thank you!

Ivri 2010-02-22 22:21:35

I've solved the problem using MDS (multi-dimensional scaling)

Ivri 2010-05-05 21:40:18

ansaurus

tags:

views:

answers:

Algorithms: data binarization

related questions