views:

80

answers:

4

I'm compiling a lookup table that needs to have 133,784,560 entries, with values ranging from 0 - 7,462

The maximum value of 7,462 can be contained within 13 bits. This gives us a lookup table of around 207 MB.

A 16 bit value increases our lookup table size around 50mb more.

The extra increase in size of the lookup table is not significant in todays age, but it would be nice to keep it as thin as possible.

When the LUT is loaded into memory, how much overhead is there to evaluate the value of a range of 13 bits, compared to evaluating the 16 bits? I'm assuming there would be some intermediary bitwise operations to convert it to a computer workable format, or am I wrong?

Every clock cycle counts, as this will be involved in a brute force analysis program that will run billions of comparisons. Should I just stick with the slightly larger LUT?

+3  A: 

I would say try it both ways and see which one is faster. Also, I think this is a good candidate to drop into C++. You can encapsulate this in a managed C++ project which you can reference directly from C#. This will allow you to do all the low level optimizations that you want while still being directly accessible to the rest of your app.

tster
+4  A: 

I would stick with 16-bit values rather than 13-bit. Since you're doing brute force analysis and billions of comparisons, the extra 50MB seems a small price to pay. Also keep in mind that the code managing the 13-bit case will be significantly more complex, as you'll usually have to read across multiple 16-bit (or 32-bit, or whatever) values and shift and combine in order to get the actual value you need. In other words, extracting value #n is going to be much more complex than simply "retrieve it from the table".

The only real way to know for sure, however, would be to try both and see... but unless you've got the time to implement the 13-bit value retrieval code that you might not end up using, I probably wouldn't bother.

JaredReisinger
+3  A: 

My guess would be that is a case of premature optimisation. Bit-fiddling is quite expensive, and will probably dwarf the extra memory-access cost, unless by sheer coincidence your cache performance hits an elbow somewhere between those two sizes.

Ultimately, there's no substitute for just trying it out.

Marcelo Cantos
I keep hearing the term premature optimisation, but don't quite understand it's meaning, could you elaborate?
Tom Gullen
Also, this is my second attempt at the same problem, (I first attempted this about 3 years ago very successfully, around 50million evaluations/s), I'm following MOSTLY the same methodology but am trying to eek out even more performance if I can so I think it's a reasonable question to ask.
Tom Gullen
@Tom: By "premature optimization" I mean "optimization before the bottleneck has been empirically determined through testing". Don't waste a lot of your valuable time optimizing something that is not the slowest thing. When every clock cycle counts *you have to concentrate on the thing that is eating the most clocks*.
Eric Lippert
http://en.wikipedia.org/wiki/Program_optimization
Simon_Weaver
http://www.google.com/search?q=codinghorror+premature+optimization
Simon_Weaver
@Tom: Programs of any significant size tend to have performance hot-spots, places in the code that the CPU spends an inordinate proportion of its time. One of the most important optimisation techniques is to first identify these hot-spots with profiling tools, and then focus all your tuning efforts on them. The term "premature optimisation" generally refers to the practice of trying optimising code that you haven't profiled yet, which amounts to all of your code, which means you end up doing a poor job of it, even on the parts of your code that really need it (the hot-spots).
Marcelo Cantos
Thank you for the explanations!
Tom Gullen
+2  A: 
dan04