I was told by c-faq that compiler do different things to deal with a[i] while a is an array or a pointer. Here's an example from c-faq:
char a[] = "hello"; char *p = "world";
Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location ``a'', move three past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location ``p'', fetch the pointer value there, add three to the pointer, and finally fetch the character pointed to.
But I was told that when dealing with a[i], the compiler tends to convert a (which is an array) to a pointer-to-array. So I want to see assembly codes to find out which is right.
EDIT:
Here's the source of this statement. c-faq And note this sentence:
an expression of the form a[i] causes the array to decay into a pointer, following the rule above, and then to be subscripted just as would be a pointer variable in the expression p[i] (although the eventual memory accesses will be different, "
I'm pretty confused of this: since a has decayed to pointer, then why does he mean about "memory accesses will be different?"
Here's my code:
// array.cpp
#include <cstdio>
using namespace std;
int main()
{
char a[6] = "hello";
char *p = "world";
printf("%c\n", a[3]);
printf("%c\n", p[3]);
}
And here's part of the assembly code I got using g++ -S array.cpp
.file "array.cpp"
.section .rodata
.LC0:
.string "world"
.LC1:
.string "%c\n"
.text
.globl main
.type main, @function
main:
.LFB2:
leal 4(%esp), %ecx
.LCFI0:
andl $-16, %esp
pushl -4(%ecx)
.LCFI1:
pushl %ebp
.LCFI2:
movl %esp, %ebp
.LCFI3:
pushl %ecx
.LCFI4:
subl $36, %esp
.LCFI5:
movl $1819043176, -14(%ebp)
movw $111, -10(%ebp)
movl $.LC0, -8(%ebp)
movzbl -11(%ebp), %eax
movsbl %al,%eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf
movl -8(%ebp), %eax
addl $3, %eax
movzbl (%eax), %eax
movsbl %al,%eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf
movl $0, %eax
addl $36, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
I can not figure out the mechanism of a[3] and p[3] from codes above. Such as:
- where was "hello" initialized?
- what does $1819043176 mean? maybe it's the memory address of "hello" (address of a)?
- I'm sure that "-11(%ebp)" means a[3], but why?
- In "movl -8(%ebp), %eax", content of poniter p is stored in EAX, right? So $.LC0 means content of pointer p?
- What does "movsbl %al,%eax" mean?
And, note these 3 lines of codes:
movl $1819043176, -14(%ebp)
movw $111, -10(%ebp)
movl $.LC0, -8(%ebp)The last one use "movl" but why did not it overwrite the content of -10(%ebp)? (I know the anser now :), the address is incremental and "movl $.LC0 -8(%ebp) will only overwrite {-8, -7, -6, -5}(%ebp))
I'm sorry but I'm totally confused of the mechanism, as well as assembly code...
Thank you very much for your help.