First of all, I don't have multiplication, division operations so i could use shifting/adding, overflow-multiplication, precalculations etc. I'm just comparing one n-bit binary number to another, but according to algorithm the quantity of such operations seems to be huge. Here it is :
- There is given a sequence of 0's and 1's that is divided into blocks. Let the length of a sequence be S, the length of a block is N which is some power of two (4,8,16,32, etc.). Quantity of blocks is n=S/N, no rocket science here.
- According to chosen N i'm building a set of all possible N-bit binary numbers, which is a collection of 2^N-1 objects.
- After this I need to compare each binary number with each block from source sequence and calculate how much times there was a match for each binary number, for example :
S : 000000001111111100000000111111110000000011111111... (0000000011111111 is repeated 6 times, 16bit x 6 = 96bits overall)
N : 8
blocks : {00000000, 11111111, 00000000, 1111111,...}
calculations:
.
// _n = S/N;
// _N2 = Math.Pow(2,N)-1
// S=96, N=8, n=12, 2^N-1=255 for this specific case
// sourceEpsilons = list of blocks from input, List<string>[_n]
var X = new int[_n]; // result array of frequencies
for (var i = 0; i < X.Length; i++) X[i] = 0; // setting up
for (ulong l = 0; l <= _N2; l++) // loop from 0 to max N-bit binary number
var currentl = l.ToBinaryNumberString(_N/8); // converting counter to string, getting "current binary number as string"
var sum = 0; // quantity of currentl numbers in blocks array
for (long i = 0; i < sourceEpsilons.LongLength; i++)
{
if (currentl == sourceEpsilons[i]) sum++; // evaluations of strings, evaluation of numbers (longs) takes the same time
}
// sum is different each time, != blocks quantity
for (var j = 0; j < X.Length; j++)
if (sum - 1 == j) X[j]++; // further processing
// result : 00000000 was matched 6 times, 11111111 6 times, X[6]=2. Don't ask me why do i need this >_<
With even small S i seem to have (2^N-1)(S/N) iterations, with N=64 the number grows to 2^64=(max value of type long) so that ain't pretty. I'm sure there is a need to optimize loops and maybe change the approach cardinally (c# implementation for N=32 takes 2h @ dual-core pc w/ Parallel.For). Any ideas how to make the above scheme less time and resource-consuming? It seems like i have to precompute binary numbers and get rid of first loop by reading "i" from file and evaluate it with blocks "on-the-fly", but the filesize will be (2^N)*N bytes ((2^N-1)+1)*N) which is somehow unacceptable too.