If all your threads are CPU-bound and not doing I/O, you will usually get best results with exactly the number of CPUs. Running more than the number of CPUs causes frequent context switching (which slows things down) and running less leaves some CPUs unused.
If the threads use a lot of RAM, you may need less than the number of CPUs to avoid going into swap. This often happens when compiling C++ code in parallel, as GCC (that is otherwise heavily CPU-bound) uses a lot of RAM for template instantiation.
If the threads do blocking I/O (usually to disk, but could be network or other external resource as well), you may need more than the number of CPUs, to keep the CPUs fully occupied, but on the other hand you may want less than that to avoid the I/O choking (mechanical HDDs in particular slow down when they need to read/write multiple locations at the same time).
If there are any real-time requirements on the system (e.g. games or video players), you should keep one CPU mostly idle (to allow frequent context switching and interactivity on that).
So, as others said, there is no simple answer. In all examples above I assumed that there are no other programs running on the system, using any significant amount of CPU (which surprisingly is most often the case). If there are other CPU users, consider that also in the calculations (e.g. by reducing the available CPU count).
The only useful "quick rules" for largely CPU-bound situations are then N-1, N and N+1 (where N is the number of available CPUs), depending on the factors mentioned. For heavily disk I/O bound situations only one thread (per HDD) should be used.