views:

195

answers:

2

According to "Introduction to Neural Networks with Java By Jeff Heaton", the input to the Kohonen neural network must be the values between -1 and 1.

It is possible to normalize inputs where the range is known beforehand: For instance RGB (125, 125, 125) where the range is know as values between 0 and 255:
1. Divide by 255: (125/255) = 0.5 >> (0.5,0.5,0.5)
2. Multiply by two and subtract one: ((0.5*2)-1)=0 >> (0,0,0)

The question is how can we normalize the input where the range is unknown like our height or weight.

Also, some other papers mention that the input must be normalized to the values between 0 and 1. Which is the proper way, "-1 and 1" or "0 and 1"?

A: 

From what I know about Kohonen SOM, they specific normalization does not really matter.

Well, it might through specific choices for the value of parameters of the learning algorithm, but the most important thing is that the different dimensions of your input points have to be of the same magnitude.

Imagine that each data point is not a pixel with the three RGB components but a vector with statistical data for a country, e.g. area, population, .... It is important for the convergence of the learning part that all these numbers are of the same magnitude.

Therefore, it does not really matter if you don't know the exact range, you just have to know approximately the characteristic amplitude of your data.

For weight and size, I'm sure that if you divide them respectively by 200kg and 3 meters all your data points will fall in the ]0 1] interval. You could even use 50kg and 1 meter the important thing is that all coordinates would be of order 1.

Finally, you could a consider running some linear analysis tools like POD on the data that would give you automatically a way to normalize your data and a subspace for the initialization of your map.

Hope this helps.

Adrien
+1  A: 

You can always use a squashing function to map an infinite interval to a finite interval. E.g. you can use tanh.

You might want to use tanh(x * l) with a manually chosen l though, in order not to put too many objects in the same region. So if you have a good guess that the maximal values of your data are +/- 500, you might want to use tanh(x / 1000) as a mapping where x is the value of your object It might even make sense to subtract your guess of the mean from x, yielding tanh((x - mean) / max).

bayer