As a Gentoo user I have tried quite a few optimizations on the complete OS and there have been endless discussions on the Gentoo forums about it. Some good flags for GCC can be found in the wiki.
In short, optimizing for size worked best on an old Pentium3 laptop with limited ram, but on my main desktop machine with a Core2Duo, -O2 gave better results over all.
There's also a small script if you are interested in the x86 (32 bit) specific flags that are the most optimized.
If you use gcc and really want to optimize a specific application, try ACOVEA. It runs a set of benchmarks, then recompile them with all possible combinations of compile flags. There's an example using Huffman encoding on the site (lower is better):
A relative graph of fitnesses:
Acovea Best-of-the-Best: ************************************** (2.55366)
Acovea Common Options: ******************************************* (2.86788)
-O1: ********************************************** (3.0752)
-O2: *********************************************** (3.12343)
-O3: *********************************************** (3.1277)
-O3 -ffast-math: ************************************************** (3.31539)
-Os: ************************************************* (3.30573)
(Note that it found -Os to be the slowest on this Opteron system.)