tags:

views:

210

answers:

1

Hi,

I was debugging some c++ code (WinCE 6 on ARM platform), and i find some behavior strange:

    4277220C    mov         r3, #0x93, 30
    42772210    str         r3, [sp]
    42772214    ldr         r3, [pc, #0x69C]
    42772218    ldr         r2, [pc, #0x694]
    4277221C    mov         r1, #0
    42772220    ldr         r0, [pc, #0x688]

Line 42772214 ldr r3, [pc, #0x69C] is used to get some constant from .DATA section, at least I think so.

What is strange that according to the code r2 should be filled with memory from address pc=0x42772214 + 0x69C = 0x427728B0, but according to the memory contents it's loaded from 0x427728B8 (8bytes+), it happens for other ldr usages too.

Is it fault of the debugger or my understanding of ldr/pc? Another issue I don't get - why access to the .data section is relative to the executed code? I find it little bit strange.

And one more issue: i cannot find syntax of the 1st mov command (any one could point me a optype specification for the Thumb (1C2))

Sorry for the laic description, but I'm just familiarizing with the assemblies.

+8  A: 

This is correct. When pc is used for reading there is an 8-byte offset in ARM mode and 4-byte offset in Thumb mode.

From the ARM-ARM:

When an instruction reads the PC, the value read depends on which instruction set it comes from:

  • For an ARM instruction, the value read is the address of the instruction plus 8 bytes. Bits [1:0] of this value are always zero, because ARM instructions are always word-aligned.
  • For a Thumb instruction, the value read is the address of the instruction plus 4 bytes. Bit [0] of this value is always zero, because Thumb instructions are always halfword-aligned.

This way of reading the PC is primarily used for quick, position-independent addressing of nearby instructions and data, including position-independent branching within a program.

There are 2 reasons for pc-relative addressing.

  1. Position-independent code, which is in your case.
  2. Get some complicated constants nearby which cannot be written in 1 simple instruction, e.g. mov r3, #0x12345678 is impossible to complete in 1 instruction, so the compiler may put this constant in the end of the function and use e.g. ldr r3, [pc, #0x50] to load it instead.

I don't know what mov r3, #0x93, 30 means. Probably it is mov r3, #0x93, rol 30 (which gives 0xC0000024)?

KennyTM
`mov r3, #0x93, 30` actually means `mov r3, #0x93, ror 30`, giving `0x24c`.
Mike Seymour
@Mike - good explanation, and good for citing the ARM ARM. In the ARM 3-stage pipeline, the PC always points to the instruction being fetched, and PC-4 points to the instruction being decoded, and PC-8 is the "current instruction", i.e. the instruction being executed. This is also why exceptions must adjust the LR value before returning. As you noted, this applies to ARM (32-bit) instructions, hence the 4-byte adjustment per pipeline stage.
Dan