I have a two dimensional array of bytes which looks like this:
int n = 100000;
int d = 128;
byte[][] samples = new byte[n][d]
/* proceed to fill samples with some delicious data */
byte[] mean = new byte[d];
findMean(mean,samples);
My findMean function proceeds to fill mean such that:
mean[k] = mean(samples[:][k])
Simple enough so far. The issue is, due to overflow issues this mean function cannot simply make a sum and divide. So my current attempt is to calculate a running mean, the workhorse of which looks something like this:
for(int i = 0; i < samples.length; i++){
byte diff = samples[i][k] - mean[k]
mean[k] = (byte)((double)mean[k] + (Math.round( (double) ( diff ) / (double) (i + 1) )))
Now this doesn't work at all, every round the precision loss results in the mean being quite far off the correct value, which i have verified on small (therefore calculable) sets of 1000 random samples.
Also, due to the very memory issues which i'm trying to avoid by using byte arrays in the first place, it is quite impossible to allocate a large proxy float array to calculate the true mean, and later cast to a byte.
Loading this data in chunks is... well it's possible but i'm considering that my final alternative, and anyway, that surly just displaces the problem to the chunk size?
Anyway, accurate calculate of the mean for an array of bytes using a running algorithm to avoid overflow issues. Is there a good solution here?
Cheers