What's my best bet for computing the dot product of a vector x with a large number of vectors y_i, where x and y_i are of length 10k or so.
- Shove the y's in a matrix and use an optimized
s/dgemv
routine? - Or maybe try handcoding an SSE2 solution (I don't have SSE3, according to cpuinfo).
I'm just looking for general guidance here, so any suggestions will be useful.
And yes, I do need the performance.
Thanks for any light.