views:

270

answers:

5

I have a neural network written in Java which uses a sigmoid transfer function defined as follows:

private static double sigmoid(double x)
{
    return 1 / (1 + Math.exp(-x));
}

and this is called many times during training and computation using the network. Is there any way of speeding this up? It's not that it's slow, it's just that it is used a lot, so a small optimisation here would be a big overall gain.

A: 

From a math point of view, I don't see any possibility to optimize it.

Femaref
+12  A: 
tangens
Maybe something more than 100 if you want a little bit more precision. A lookup table of 5000 (but probably even 1000) values will be absolutely sufficient IMHO.
nico
For more precision, it is probably better to do linear interpolation between the nearest two values.
Jouni K. Seppänen
The problem is symetrical, so you only need half the values. Calculating the other side is trivial.
Peter Lawrey
This is a plot of completely different function. erf(x) is hard to calculate, exp(x) is not.
Ha
@Ha : Nice catch. This looks like the bipolar sigmoid function. The sigmoid function in the OP has horizontal asymptotes 0 and 1.
Zaid
Is it really worth replacing the existing function with an interpolation scheme? I would imagine it's slower.
Zaid
I've linked to the correct graph.
tangens
@Zaid: You will just have a lookup table, take the values corresponding to the first x that is greater than yours and the first that is lower, then do the mean. So it's just a sum and a division by 2, definitely faster.
nico
+1  A: 

It's a pretty smooth function, so a lookup and interpolation scheme is likely to be more than sufficient.

When I plot the function over a range of -10 <= x <= 10, I get five place accuracy at the extremes. Is that good enough for your application?

duffymo
+7  A: 

There's a trick to speeding up floating point:

http://users.computerweekly.net/robmorton/projects/neural/sigmoid.htm

Bottom line: do the math necessary to avoid using floating point.

S.Lott
A: 

If you have a lot of nodes where the value of x is outside the -10..+10 box, you can just omit to calculate those values at all, e.g., like so ..

if( x < -10 )
    y = 0;
else if( x > 10 )
    y = 1;
else
    y = 1 / (1 + Math.exp(-x));
return y;

Of course, this incurs the overhead of the conditional checks for EVERY calculation, so it's only worthwhile if you have lots of saturated nodes.

Another thing worth mentioning is, if you are using backpropagation, and you have to deal with the slope of the function, it's better to compute it in pieces rather than 'as written'.

I can't recall the slope at the moment, but here's what I'm talking about using a bipolar sigmoid as an example. Rather than compute this way

y = (1 - exp(-x)) / (1 + exp(-x));

which hits exp() twice, you can cache up the costly calculations in temporary variables, like so

temp = exp(-x);
y = (1 - temp) / (1 + temp);

There are lots of places to put this sort of thing to use in BP nets.

JustJeff