If you are writing code for a PC, there is unlikely to be any meaningful speed advantage either way. On some embedded systems, it may be advantageous to avoid all local variables. On some other systems, local variables may be faster.
An example of the former: on the Z80, the code to set up the stack frame for a function with any local variables was pretty long. Further, the code to access local variables was limited to using the (IX+d) addressing mode, which was only available for 8-bit instructions. If X and Y were both global/static or both local variables, the statement "X=Y" could assemble as either:
; If both are static or global: 6 bytes; 32 cycles
ld HL,(_Y) ; 16 cycles
ld (_X),HL ; 16 cycles
; If both are local: 12 bytes; 56 cycles
ld E,(IX+_Y) ; 14 cycles
ld D,(IX+_Y+1) ; 14 cycles
ld (IX+_X),D ; 14 cycles
ld (IX+_X+1),E ; 14 cycles
A 100% code space penalty and 75% time penalty in addition to the code and time to set up the stack frame!
On the ARM processor, a single instruction can load a variable which is located within +/-2K of an address pointer. If a function's local variables total 2K or less, they may be accessed with a single instruction. Global variables will generally require two or more instructions to load, depending upon where they are stored.