ansaurus

Question

Heavy computations analysis/optimization

Answer 1

A:

I'm just comparing one n-bit binary number to another

Isn't that what memcmp is for?

You're looping through every possible integer value, and it's taking 2 hours, and you're surprised at this? There's not much you can do to streamline things if you need to iterate that much.

Billy ONeal 2010-07-19 19:09:50

Answer 2

+5 A:

It seems like what you want is a count of how many times each specific block occurred in your sequence; if that's the case, comparing every block to all possible blocks and then tallying is a horrible way to go about it. You're much better off making a dictionary that maps blocks to counts; something like this:

var dict = Dictionary<int, int>();
for (int j=0; j<blocks_count; j++)
{
    int count;
    if (dict.TryGetValue(block[j], out count)) // block seen before, so increment
    {
        dict[block[j]] = count + 1;
    }
    else // first time seeing this block, so set count to 1
    {
        dict[block[j]] = 1; 
    }
}

After this, the count q for any particular block will be in dict[the_block], and if that key doesn't exist, then the count is 0.

tzaman 2010-07-19 19:24:44

It seems like this approach is acceptable. I was also counting blocks absent in source, but you are right that much easier is to enumerate through available blocks and then subtract their quantity from 2^N which will do the same. Thanks.

Alcz 2010-07-21 18:05:50

Answer 3

A:

Are you trying to get the number of unique messages in S? For instance in your given example, for N = 2, you get 2 messages (00 and 11), for N = 4 you get 2 messages, (0000 and 1111), and for N = 8 you get 1 message (00001111). If that's the case, then the dictionary approach suggested by tzaman is one way to go. Another would be sort the list first, then run through it and look for each message. A third, naive, approach would be to use a sentinel message, all 0's for instance, and run through looking for messages that are not the sentinel. When you find one, destroy all its copies by setting them to the sentinel. For instance:

int CountMessages(char[] S, int SLen, int N) {
    int rslt = 0;
    int i, j;
    char *sentinel;

    sentinel = calloc((N+1)*sizeof(char));

    for (i = 0; i < N; i ++)
        sentinel[i] = '0';

    //first, is there a sentinel message?
    for (i = 0; ((i < SLen) && (rslt == 0)); i += N) {
        if (strncmp(S[i], sentinel, N) == 0)
            rslt++;
    }

    //now destroy the list and get only the unique messages
    for (i = 0; i < SLen; i += N) {
        if (strncmp(S[i], sentinel, N) != 0) { //first instance of given message
            rslt++;                
            for (j = i+N; j < SLen; j += N) { //look for all remaining instances of this message and destroy them
                if (strncmp(S[i], S[j], N) == 0)
                    strncpy(S[j], sentinel, N); //destroy message
            }
        }
    }

    return rslt;
}

The first means using either a pre-written dictionary or writing your own. The second and third destroy the list, meaning you have to use a copy for each 'N' you want to test, but are pretty easy. As for parallelization, the dictionary is the easiest, since you can break the string into as many sections as you have threads, do a dictionary for each, then combine the dictionaries themselves to get the final counts. For the second, I imagine the sort itself can be made parallel easily, then there's a final pass to get the count. The third would require you to do the sentinel-ization on each substring, then redo it on the final recombined string.

Note the big idea here though: rather than looping through all the possible answers, you only loop over all the data!

mtrw 2010-07-19 20:24:20

Well, sounds reasonable. The stuff above is direct interpretation of some stochastic process formula involving Kronecker's delta function that can be easily reverted from f(i,j) to f(j,i). The only problem is if that's possible for that specific case... I don't want to break other parts of system. Thanks anyway.

Alcz 2010-07-19 20:44:01

Answer 4

A:

Instead of a dictionary, you can also use a flat file, of size 2^N entries, each of a size of, for example integer.

This would be your counting pad. Instead of looping through all possible numbers in a collection, and comparing to your currently viewed number, you iterate through S forward only like such:

procedure INITIALIZEFLATFILE is
    allocate 2^N * sizeof(integer) bytes to FLATFILE
end procedure

procedure COUNT is
    while STREAM is not at END
        from FLATFILE at address STREAM.CURRENTVALUE read integer into COUNT
        with FLATFILE at address STREAM.CURRENTVALUE write integer COUNT+1
        increment STREAM
    end while
end procedure

A dictionary is conservative on space in the beginning, and requires a lookup to the proper index later on. If you expect all possible integers eventually, you can keep a fixed-size "scorecard" from the getgo.

maxwellb 2010-07-19 20:43:04

If N is relatively small, for example 8 as in your example, a scorecard like this using integers to count will allow you to use 2^8*4 == 1024 bytes of memory for each scorecard.This grows quickly as you track larger bit-width values.

maxwellb 2010-07-19 20:46:51

ansaurus

tags:

views:

answers:

Heavy computations analysis/optimization

related questions