You should assume GCC won't auto-vectorize your code, because it sounds like that is very unlikely to happen!
Like Paul said, to get the most performance from your iPhone you should write your own ARM Assembly code using NEON SIMD instructions for as much of it as you can. But that assumes you understand ARM assembly language as well as NEON, timing delays, etc. So if you don't want to learn ARM Assembly language, then Apple's Accelerate framework and ARM's OpenMAX libraries both have numerous functions that are already written in ARM assembly language with NEON SIMD instructions.
So either Accelerate or OpenMAX should be very good if you can use them. I haven't compared the 2 to see which one is actually faster, but I assume ARM's OpenMAX is slightly faster than Apple's implementation since ARM designed the NEON specs! But they should both run extremely fast.