On some hardware, arithmetic involving double values may take longer than that involving single values, but most recent FPUs have a single native data type (e.g., 80-bit extended floating point values for x86) which will be used internally for calculations regardless of what in-memory data type you are using. So that is to say that "FPU calculations will be faster with single precision" is generally not a reason to use single-precision on most modern hardware today.
That said, in addition to the "uses less memory" reasons elaborated on in the other answers, there is a very practical reason when it comes to SIMD vector instructions like SSE and AltiVec - single precision is likey to be twice as fast as double precision, since the instructions operate on vectors of fixed size, and you can stuff twice as many single precision values into a single vector, with the processing time typically remaining the same.
For example, with a 128-bit vector unit capable of processing vector multiplications in 2 clock cycles, you could get a throughput of 2 single precision multiplications per clock, versus 1 double precision, since you can 4 singles in a vector, versus two doubles.
A similar effect occurs with memory bandwidth, and is not specific to vector processing - if you have large arrays of doubles, they will not only take twice the space, but may take up to twice as long to process in the case that your algorithm is bandwidth constrained (which is increasingly likely given the increasing sizes and decreasing latencies of vector processing units).