views:

547

answers:

5

I'm building an interpreter and as I'm aiming for raw speed this time, every clock cycle matters for me in this (raw) case.

Do you have any experience or information what of the both is faster: Vector or Array? All what matters is the speed I can access an element (opcode receiving), I don't care about inserting, allocation, sorting, etc.

I'm going to lean myself out of the window now and say:

  • Arrays are at least a bit faster than vectors in terms of accessing an element i.

It seems really logical for me. With vectors you have all those security and controlling overhead which doesn't exist for arrays.

(Why) Am I wrong?

No, I can't ignore the performance difference - even if it is so small - I have already optimized and minimized every other part of the VM which executes the opcodes :)

+3  A: 

No. Under the hood, both std::vector and C++0x std::array find the pointer to element n by adding n to the pointer to the first element.

vector::at may be slower than array::at because the former must compare against a variable while the latter compares against a constant. Those are the functions that provide bounds checking, not operator[].

If you mean C-style arrays instead of C++0x std::array, then there is no at member, but the point remains.

EDIT: If you have an opcode table, a global array (such as with extern or static linkage) may be faster. Elements of a global array would be addressable individually as global variables when a constant is put inside the brackets, and opcodes are often constants.

Anyway, this is all premature optimization. If you don't use any of vector's resizing features, it looks enough like an array that you should be able to easily convert between the two.

Potatoswatter
What's `array<>`? The OP's question seems to be about built-in arrays, not about some `array<>`. Built-in arrays are not pointers under the hood.
AndreyT
@Andrey: built-in arrays are implicitly converted to pointers by the built-in subscript operator, so I would say they are.
Potatoswatter
@Potatoswatter: The conversion to pointer is purely conceptual. It exists only at the language level (just to define the semantics of `[]` operator in the standard document). The resultant conceptual pointer is a compile-time value, which means that once we get "under the hood" there is no real pointer involved there.
AndreyT
@Andrey: it is invariably a runtime value for non-constant values inside the brackets. There is no way to do it besides base+index addressing.
Potatoswatter
@Potatoswatter: No, I'm talking about *base* value alone. In case of a real array, the base is a compile time constant, while in case of a vector, the base is a run-time value.
AndreyT
@Andrey: the only way the base can be a compile-time constant is to be a global. I think you just mean it's directly offset from the stack pointer. Anyway, I think we've both conveyed our points.
Potatoswatter
@Potatoswatter: Yes, you are right. But still peeformance-wise constant stack offset is closer to a compile-time constant than to an explicit memory read, because offset-addressing is normally supported by a single instruction (as in the example above) and requires no memory read.
AndreyT
+2  A: 

You're comparing apples to oranges. Arrays have a constant-size and are automatically allocated, while vectors have a dynamic size and are dynamically allocated. Which you use depends on what you need.

Generally, arrays are "faster" to allocate (in quotes because comparison is meaningless) because dynamic allocation is slower. However, accessing an element should be the same. (Granted an array is probably more likely to be in cache, though that doesn't matter after the first access.)

Also, I don't know what "security" you're talking about, vector's have plenty of ways to get undefined behavior just like arrays. Though they have at(), which you don't need to use if you know the index is valid.

Lastly, profile and look at the generated assembly. Nobody's guess is gonna solve anything.

GMan
+14  A: 

Element access time in a typical implementation of a std::vector is the same as element access time in an ordinary array available through a pointer object (i.e. a run-time pointer value)

std::vector<int> v;
int *pa;
...
v[i];
pa[i]; 
// Both have the same access time

However, the access time to an element of an array available as an array object is better than both of the above accesses (equivalent to access through a compile-time pointer value)

int a[100];
...
a[i];
// Faster than both of the above

For example, a typical read access to an int array available through a run-time pointer value will look as follows in the compiled code on x86 platform

// pa[i]
mov ecx, pa // read pointer value from memory
mov eax, i
mov <result>, dword ptr [ecx + eax * 4]

Access to vector element will look pretty much the same.

A typical access to a local int array available as an array object will look as follows

// a[i]
mov eax, i
mov <result>, dword ptr [esp + <offset constant> + eax * 4]

A typical access to a global int array available as an array object will look as follows

// a[i]
mov eax, i
mov <result>, dword ptr [<absolute address constant> + eax * 4]

The difference in perfromance arises from that extra mov instruction in the first variant, which has to make an extra memory access.

However, the difference is negligible. And it is easily optimized to the point of being exactly the same in multiple-access context (by loading the target address in a register).

So the statement about "arrays being a bit faster" is correct in narrow case when the array is accessible directly through the array object, not through a pointer object. But the practical value of that difference is virtually nothing.

AndreyT
What are you talking about? You should really illustrate this with some kind of machine-level description of what would be different.
Potatoswatter
@Potatoswatter: I just added a machine level description.
AndreyT
@Andrey: OK. But that presumes the array is on the stack, not in a data structure, passed by reference, or in namespace scope. It also presumes the `vector` base pointer isn't promoted to a register, which is to be expected inside a loop.
Potatoswatter
@Potatoswatter: Yes, but firstly, when you pass a `vector` somewhere, you usually pass it by refernce as well, so in case of a `vector` you have to make *two* dereferences: to get to the `vector` body itself and then to get to the array itself. For this reason we can take the first dereference out of consideration (in both cases) and focus on the array access alone. Secondly, as I said myself, in multiple-access context (like a loop) the difference is eaily optimized away by pre-loading the base address into a register.
AndreyT
Now that's the kind of detailed answer I come on SO for!
Matthieu M.
+5  A: 

You may be barking up the wrong tree. Cache misses can be much more important than the number of instructions that get executed.

Jive Dadson
A: 

For decent results, use std::vector as the backing storage and take a pointer to its first element before your main loop or whatever:

std::vector<T> mem_buf;
// stuff
uint8_t *mem=&mem_buf[0];
for(;;) {
    switch(mem[pc]) {
    // stuff
    }
}

This avoids any issues with over-helpful implementations that perform bounds checking in operator[], and makes single-stepping easier when stepping into expressions such as mem_buf[pc] later in the code.

If each instruction does enough work, and the code is varied enough, this should be quicker than using a global array by some negligible amount. (If the difference is noticeable, the opcodes need to be made more complicated.)

Compared to using a global array, on x86 the instructions for this sort of dispatch should be more concise (no 32-bit displacement fields anywhere), and for more RISC-like targets there should be fewer instructions generated (no TOC lookups or awkward 32-bit constants), as the commonly-used values are all in the stack frame.

I'm not really convinced that optimizing an interpreter's dispatch loop in this way will produce a good return on time invested -- the instructions should really be made to do more, if it's an issue -- but I suppose it shouldn't take long to try out a few different approaches and measure the difference. As always in the event of unexpected behaviour the generated assembly language (and, on x86, the machine code, as instruction length can be a factor) should be consulted to check for obvious inefficiencies.

brone