tags:

views:

334

answers:

5

Hi,

for my work, I need to reverse what this portion of code (ARM9) is doing. Im a java developper & I really don't understand this portion of code related to a single function.

Of course I'm asking help because the original code is not more available. Anyone can help me to know what this code is doing with a smal algorithm in any high language? It would be nice. I have tried for many hours without results.

sub_FFFF7B38
    PUSH    {LR}
    ADDS    R2, R0, #0
    LDRB    R3, [R2]
    CMP     R3, #0
    BEQ     loc_FFFF7B52
    SUBS    R1, #1
    BCC     loc_FFFF7B52

loc_FFFF7B46:
    ADDS    R0, #1
    LDRB    R3, [R0]
    CMP     R3, #0
    BEQ     loc_FFFF7B52
    SUBS    R1, #1
    BCS     loc_FFFF7B46

loc_FFFF7B52:
    SUBS    R0, R0, R2
    POP     {R1}
+1  A: 

How about this: Instruction set for ARM

Some hints / simplicifed asm

  • Push - Puts something on the "Stack" / Memory
  • Add - Usualy "add" as in +
  • Pop retreives something from the "stack" / Memory
  • CMP - is Short of Compare, which compares something with something else.

X: or: Whatever: means that the following is a "subroutine". Ever used "goto" in Java? Similar to that actually.

If you have the following ( ignore if it is correct arm-asm it's just pseduo ):

PUSH 1
x:     
    POP %eax

First it would put 1 on the stack and then pop it back into eax ( which is short for extended ax, which is a register where you can put 32-bit amount of data )

Now, what does the x: do then? Well let's assume that there are 100 lines of asm before that aswell, then you could use a "jump"-instruction to navigate to x:.

That's a little bit of introduction to asm. Simplified.

Try to understand the above code and examine the instruction-set.

Filip Ekberg
I understand push, add, cmp, jmp but i still to understand the purpose of the code.
mada
A: 

Filip has provided some pointers, you also need to read up on the ARM calling convention. (That is to say, which register(s) contain the function arguments on entry and which its return value.)

From a quick reading I think this code is strnlen or something closely related to it.

crazyscot
+6  A: 

Except for the last two lines, it could be something like the following.
Please don't hit me if I am not 100% correct.

If
R0 is p0 or p and
R1 is n and
R2 is temporary value (edited; first I thought: i or address of p0[i])
R3 is temporary value

.

sub_FFFF7B38
          PUSH {LR}           ; save return address
          ADDS R2, R0, #0     ; move R0 to R2
          LDRB R3, [R2]       ; load *p0
          CMP R3, #0          ; if *p0==0 
          BEQ loc_FFFF7B52    ; then jump to loc_FFFF7B52 
          SUBS R1, #1         ; decrement n
          BCC loc_FFFF7B52    ; if there was a borrow (i.e. n was 0): jump to loc_FFFF7B52


loc_FFFF7B46:
          ADDS R0, #1         ; increment p
          LDRB R3, [R0]       ; load *p
          CMP R3, #0          ; if *p==0
          BEQ loc_FFFF7B52    ; jump to loc_FFFF7B52
          SUBS R1, #1         ; decrement n
          BCS loc_FFFF7B46    ; if there was no borrow (i.e. n was not 0): jump to loc_FFFF7B46


loc_FFFF7B52:
          SUBS R0, R0, R2     ; calculate p - p0
          POP {R1}            ; ??? I don't understand the purpose of this
                              ; isn't there missing something?

or in C:

int f(char *p0, unsigned int n)
{
  char *p;

  if (*p0==0 || n--==0)
    return 0;

  for(p=p0; *++p && n>0; n--)
  {
  }
  return p - p0;
}
Curd
@Curd - I think you got a lot closer than what I was thinking and I don't think I can top your answer. +1
Heather
R2 is a temporary register too. `ADDS R2, R0, #0` overwrites `R2` before it is read. `R2` just saves the original value of `R0`, it isn't an index so shouldn't it be `p0` ?. Also `CC` is the "carry clear" condition, not carry set, (see `CS` later on).
Charles Bailey
@Charley Bailey:(1) Concerning R2 you are right. I will edit my answer acordingly.(2) Yes, CC is "Carry Clear" and CS is "Carry Set", but ARM is using the Carry flag for subraction/comparision just the other way round as one would expect (and other architectures do). So when I am talking about "carry" in the comment, it reflects not the actual ARM flag bit, but whether there was a borrow or not.Thanks for your corrections.
Curd
@Curd: Yes, you're right. I haven't ever needed to manually use the carry bit with subtraction so I wasn't aware of this. I should have known though, as I've used SBC before and that needs it to work correctly.
Charles Bailey
I presume that after the `POP` there must be a `MOV PC, R1` or similar. TBH I wasn't aware that `PUSH` and `POP` were ARM instructions but I presume they do something similar to `STMFD R13!, {R14}` and the `LDM` equivalent.
Charles Bailey
@Charles Bailey: yes, I think that too. But why was LR saved in the first place? (There are no subroutines called within the function). `MOV PC,R14` at the end would do it.
Curd
At a guess, it was just standard prologue and epilogue generated by a compiler that doesn't (or at least didn't) optimize these things out when not needed.
Charles Bailey
+1  A: 

My ASM is a bit rusty, so no rotten tomatoes please. Assuming this starts at sub_FFFF7B38:

The command PUSH {LR} preserves the link register, which is a special register which holds the return address during a subroutine call.

ADDS sets the flags (like CMN would). Also ADDS R2, R0, #0 adds R0 to 0 and stores in R2. (Correction from Charles in comments)

LDRB R3, [R2] is loading the contents of R2 into main memory instead of a register, referenced by R3. LDRB only loads a single byte. The three unused bytes in the word are zeroed upon loading. Basically, getting R2 out of the registers and in safe keeping (maybe).

CMP R3, #0 performs a subtraction between the two operands and sets the register flags, but does not store a result. Those flags lead to...

BEQ loc_FFFF7B521, which means "If the previous comparison was equal, go to loc_FFFF7B521" or if(R3 == 0) {goto loc_FFFF7B521;}

So if R3 isn't zero, then the SUBS R1, #1 command subtracts one from R1 and sets a flag.

BCC loc_FFFF7B52 will cause execution to jump to loc_FFFF7B52 if the carry flag is set.

( snip )

Finally, POP {LR} restores the previous return address that was held on the link register before this code executed.

Edit - While I was in the car, Curd spelled out just about what I was thinking when I was trying to write out my answer and ran out of time.

Heather
`ADDS` sets the flags (like `CMN` would). Also `ADDS R2, R0, #0` adds `R0` to 0 and stores in `R2`, it doesn't add `R0` and `R2`.
Charles Bailey
@Charles - I couldn't find the `ADDS` command in any of my own documentation (all x86), nor did I see it on the website Fillip provided. Thanks for the correction. :)
Heather
`S` is just a suffx, `ADDS` isn't a separate instruction. You can suffix any arithmetic instruction with `S` to get the flags set.
Charles Bailey
+3  A: 

Here are the instructions commented line by line

sub_FFFF7B38
    PUSH    {LR}          ; save LR (link register) on the stack
    ADDS    R2, R0, #0    ; R2 = R0 + 0 and set flags (could just have been MOV?)
    LDRB    R3, [R2]      ; Load R3 with a single byte from the address at R2
    CMP     R3, #0        ; Compare R3 against 0...
    BEQ     loc_FFFF7B52  ; ...branch to end if equal
    SUBS    R1, #1        ; R1 = R1 - 1 and set flags
    BCC     loc_FFFF7B52  ; branch to end if carry was clear which for subtraction is
                          ; if the result is not positive

loc_FFFF7B46:
    ADDS    R0, #1        ; R0 = R0 + 1 and set flags
    LDRB    R3, [R0]      ; Load R3 with byte from address at R0
    CMP     R3, #0        ; Compare R3 against 0...
    BEQ     loc_FFFF7B52  ; ...branch to end if equal
    SUBS    R1, #1        ; R1 = R1 - 1 and set flags
    BCS     loc_FFFF7B46  ; loop if carry set  which for subtraction is
                          ; if the result is positive

loc_FFFF7B52:
    SUBS    R0, R0, R2    ; R0 = R0 - R2
    POP     {R1}          ; Load what the previously saved value of LR into R1
                          ; Presumably the missing next line is MOV PC, R1 to
                          ; return from the function.

So in very basic C code:

void unknown(const char* r0, int r1)
{
    const char* r2 = r0;
    char r3 = *r2;
    if (r3 == '\0')
        goto end;
    if (--r1 <= 0)
        goto end;

loop:
    r3 = *++r0;
    if (r3 == '\0')
        goto end;
    if (--r1 > 0)
        goto loop;

end:
    return r0 - r2;
}

Adding some control structures to get rid of the gotos:

void unknown(const char* r0, int r1)
{
    const char* r2 = r0;
    char r3 = *r2;

    if (r3 != '\0')
    {
        if (--r1 >= 0)
        do
        {
             if (*++r0 == '\0')
                 break;
        } while (--r1 >= 0);
    }

    return r0 - r2;
}

Edit: Now that my confusion about the carry bit and SUBS has been cleared up this makes more sense.

Simplifying:

void unknown(const char* r0, int r1)
{
    const char* r2 = r0;

    while (*r0 != '\0' && --r1 >= 0)
        r0++;

    return r0 - r2;
}

In words, this is find the index of the first NUL in the first r1 chars of the string pointer to by r0, or return r1 if none.

Charles Bailey
In your second example, it's not `while (r1-- ==0)` but `while (r1-- != 0)`. So your final example is in fact what it really does, in other words it's a length-bounded `strlen`.
Andrew McGregor
@Andrew McGregor: I switched if !=, goto end to while ==. I'm fairly sure this is a correct transformation. I did spend some time puzzling over this as it seemed so wrong. Are you completely sure I messed it up?
Charles Bailey
ARM's carry flag is inverted; it's a not-carry bit, so subs followed by bcc means branch if the subtraction didn't give zero
Andrew McGregor
@Andrew McGregor: That would explain it. Did this change at some point because I certainly don't remember hitting this on the ARM2 - although that was some time ago.
Charles Bailey
@Andrew McGregor: OK, I found some documentation; carry set is as I expected for addition, but for subtraction it's set when the result of a subtraction is positive, which makes sense considering how SBC works.
Charles Bailey
Ah, yes... it's normal for add and inverted for subtract... I missed that detail.
Andrew McGregor