Short answer: Because it is easier to identify and analyze at runtime the hotspots - the parts of your program that are using the most time.
Long answer:
If you start running the code in interpreted mode a virtual machine can count how often and how long different parts of the code are used. These parts can be optimized better.
Take nested if-then-else-clauses. Fewer boolean checks need lesser runtime. If you optimize the path for the part, that is executed more often, you can get better overall runtime.
Another point is, that at runtime you can make assumptions, that are impossible at compile-time. The Java-VM for instance inlines in server-mode virtual methods - as long only one class is loaded, that implements these method. That would be unsafe, if done at compile time. The JVM deoptimizes the code again, if another class is loaded, but often this never happens.
Also at runtime is more known about the machine the program runs on. If you have a machine with more registers you could use them. Again, that is not safe if done at compile-time.
One thing to say: optimizing at runtime has also disadvantages. Most important: the time for the optimizations is added to the runtime of the program. Also it is more complicated, because you have to compile parts of the program and executes them. Bugs in the virtual machine are critical. Think about a compiler, that sometimes crash - you can compile again and everything is fine. If a VM crashes sometimes, that means your program is crashing sometimes. Not good.
As a conclusion: You can do every optimization at runtime, that is possible at compile-time ... and some more. You have more information about the program, it's execution-paths and the machine the program is running. But you have to factor in the time needed for running the optimizations. Also it is more complicated to do at runtime and faults are more relevant than at compile-time.