To get at the heart of this question... [Note that non-JVM implementations may do things differently if allowed by the JLS...]
First, keep in mind that the local variable "o" in the example is a pointer, not an actual object.
All local variables are allocated on the runtime stack in 4-byte slots. doubles and longs require two slots; other primitives and pointers take one. (Even booleans take a full slot)
A fixed runtime-stack size must be created for each method invocation. This size is determined by the maximum local variable "slots" needed at any given spot in the method.
In the above example, both versions of the code require the same maximum number of local variables for the method.
In both cases, the same bytecode will be generated, updating the same slot in the runtime stack.
In other words, no performance penalty at all.
HOWEVER, depending on the rest of the code in the method, the "declaration outside the loop" version might actually require a larger runtime stack allocation. For example, compare
for (...) { Object o = ... }
for (...) { Object o = ... }
with
Object o;
for (...) { /* loop 1 */ }
for (...) { Object x =...; }
In the first example, both loops require the same runtime stack allocation.
In the second example, because "o" lives past the loop, "x" requires an additional runtime stack slot.
Hope this helps,
-- Scott