views:

595

answers:

2

I am attempting to read from main memory using masm32 assembly and in order to do this I created (as previously recommended in an answer to another of my questions here) an array that will contain the greatly separated memory locations (in order to avoid reading from cache). I have managed to create the array and have it being read, however, I have a problem. This array I already did and tested is working with data I gave it (i.e. numbers) and everything works. But I need memory locations and I can't find anywhere a map or references to those. I mean I need something like:

my_arr db 5, 2, 8, 9, 1, 7, 3, 0, 4, 6

but instead of using numbers I should be using the corresponding memory location reserved words. But I can't find them =( or even have an idea of what else to look for.


Edit

Let me just check if I got it correctly, you then tell me that I could (instead of using an array use the variables with all the space in between as to force reading from main memory right?

A: 

This is an array of (contiguous) bytes, as you said:

my_arr db 5, 2, 8, 9, 1, 7, 3, 0, 4, 6

This is a variable that occupies 10 MB (which is large relative to the CPU cache):

wasted_space BYTE 10485760 DUP(?)

Here are several variables with a lot of wasted space in between:

my_var_1 db 5
spacer_1 BYTE 10485760 DUP(?)
my_var_2 db 2
spacer_2 BYTE 10485760 DUP(?)
my_var_3 db 8
spacer_3 BYTE 10485760 DUP(?)
my_var_4 db 9
spacer_4 BYTE 10485760 DUP(?)
my_var_5 db 1
spacer_5 BYTE 10485760 DUP(?)
my_var_6 db 7
spacer_6 BYTE 10485760 DUP(?)
my_var_7 db 3
spacer_7 BYTE 10485760 DUP(?)
my_var_8 db 0
spacer_8 BYTE 10485760 DUP(?)
my_var_9 db 4
spacer_8 BYTE 10485760 DUP(?)
my_var_10 db 6

This (creating variables in your data segment) is one way to get some data memory addresses (the variables don't contain the address ... rather, the variables are at addresses).

Another way to get memory addresses is to invoke O/S APIs, which allocate memory from the heap and return the address of that allocated memory, for example maybe the HeapAlloc or VirtualAlloc APIs.


I don't know why your doing this in ASM (except to learn assembly). If it's to learn about caching, I'd have thought you could do it just as well (and, more easily) using C.

Anyway, I got curious about caching: how much space is enough to cause a cache miss? How many different variables are necessary to begin to cause misses (given that the cache is split and so can contain several (but only a few) widely-spaced memory caches)?

It (caching) has, over the years, become a complicated subject, apparently. http://lwn.net/Articles/252125/ is an article linked from Wikipedia. This article includes some graphs, e.g. Figure 3.11: Sequential Read for Several Sizes.

ChrisW
I'm not familiar with how MASM builds its object files, but those spacers could make your obj file pretty big. You could do the same using the 'org' statement to put each variable at a different address.
Nathan Fellman
Thanks (I've forgotten most of MASM). The `DUP(?)` means don't initialize it to anything in particular, so I don't know if it will take space in the executable.
ChrisW
A: 

Indirect memory access in Assembly

To treat the bytes in the array as memory addresses, you will need to load them into a register that can serve as a base address, and then access the memory pointed by the register:

MOV AX, [MY_ARR+3]  ; Element 3 in array, that is 9
MOV BX, [AX]        ; Read from that address

About caches

Note that your cache is likely much bigger than the span of memory addresses covered by this array, so all would fit in the cache.

Also, consider that your cache is probably associative, meaning that addresses very far apart can fit together into the cache if they do not happen to be on the same (full) cache lines.

To actually run out the cache and guarantee that you will have to access memory directly, you should access (in a loop) a set of consecutive memory locations bigger than your cache. I.e. create an array as big as your cache. Also take into account that there are probably multiple layers of cache (L1, L2, possibly L3 and further), so how big you need to be depends on what cache you want to overrun.


I wrote a program in C to time memory and cache accesses like that once, and with some statistic calculation and compensation for the time measurement overhead (which is non-negligible in such short scales of time), got really accurate results (which could be made as accurate as needed by running the test for longer and waiting for the standard deviation to go down).

My program was however not the most efficient way of doing that, and also did not hint much about the associativity of the cache (I'd have to measure that separately with knowledge of the coloring scheme). However both were done rather efficiently in a relatively architecture-independent manner with a few thought-out tricks in the SIGMETRICS 2005 work of Larry McVoy and Carl Staelin.

Tom Alsberg