I'm procedurally generating 128-byte blocks with some set n-byte header reserved for machine-language functions that I'm simply calling via in-line assembly. They aren't defined anywhere and are generated at run-time into pages allocated into memory with access for execution. However, I want to reserve the end (128 - n) bytes of these blocks for storing data for use within these functions due to being able to shrink the memory offset calls to 8 bits instead of 32 bits and also (possibly?) aiding with caching. However, caching is what I'm worried about.
Assuming I have a processor that has both cache(s) for data and also an instruction cache, how well does the typical processor of this kind deal with this sort of formatting? Will it attempt to load the data after my instructions as instructions themselves into the instruction cache? Could this cause a significant performance penalty as the processor tries to figure out how to deal with these junk and possibly invalid "instructions" considering they'll be floating around in near proximity for essentially every call? Will it load this data into the normal L1/L2 caches once I do my first access of it at the head of the data segment or will it just be all confused at this point?
Edit: I guess I should add that optimization of through-put is, obviously, rather important. How confusing or difficult the optimization is doesn't matter in this case, just minimizing the execution time of the code.