views:

360

answers:

4

I want to create a very large array on which I write '0's and '1's. I'm trying to simulate a physical process called random sequential adsorption, where units of length 2, dimers, are deposited onto an n-dimensional lattice at a random location, without overlapping each other. The process stops when there is no more room left on the lattice for depositing more dimers (lattice is jammed).

Initially I start with a lattice of zeroes, and the dimers are represented by a pair of '1's. As each dimer is deposited, the site on the left of the dimer is blocked, due to the fact that the dimers cannot overlap. So I simulate this process by depositing a triple of '1's on the lattice. I need to repeat the entire simulation a large number of times and then work out the average coverage %.

I've already done this using an array of chars for 1D and 2D lattices. At the moment I'm trying to make the code as efficient as possible, before working on the 3D problem and more complicated generalisations.

This is basically what the code looks like in 1D, simplified:

int main()
{
    /* Define lattice */
    array = (char*)malloc(N * sizeof(char));

    total_c = 0;

    /* Carry out RSA multiple times */
    for (i = 0; i < 1000; i++)
        rand_seq_ads();

    /* Calculate average coverage efficiency at jamming */
    printf("coverage efficiency = %lf", total_c/1000);

    return 0;
}

void rand_seq_ads()
{
    /* Initialise array, initial conditions */
    memset(a, 0, N * sizeof(char));
    available_sites = N;
    count = 0;

    /* While the lattice still has enough room... */
    while(available_sites != 0)
    {
        /* Generate random site location */
        x = rand();

        /* Deposit dimer (if site is available) */
        if(array[x] == 0)
        {
            array[x] = 1;
            array[x+1] = 1;
            count += 1;
            available_sites += -2;
        }

        /* Mark site left of dimer as unavailable (if its empty) */
        if(array[x-1] == 0)
        {
            array[x-1] = 1;
            available_sites += -1;
        }
    }

    /* Calculate coverage %, and add to total */
    c = count/N
    total_c += c;
}

For the actual project I'm doing, it involves not just dimers but trimers, quadrimers, and all sorts of shapes and sizes (for 2D and 3D).

I was hoping that I would be able to work with individual bits instead of bytes, but I've been reading around and as far as I can tell you can only change 1 byte at a time, so either I need to do some complicated indexing or there is a simpler way to do it?

Thanks for your answers

+2  A: 

You can use & (bitwise and) and << (left shift).

For example, (1 << 3) results in "00001000" in binary. So your code could look like:

char eightBits = 0;

//Set the 5th and 6th bits from the right to 1
eightBits &= (1 << 4);
eightBits &= (1 << 5);
//eightBits now looks like "00110000". 

Then just scale it up with an array of chars and figure out the appropriate byte to modify first.

For more efficiency, you could define a list of bitfields in advance and put them in an array:

#define BIT8 0x01
#define BIT7 0x02
#define BIT6 0x04
#define BIT5 0x08
#define BIT4 0x10
#define BIT3 0x20
#define BIT2 0x40
#define BIT1 0x80

char bits[8] = {BIT1, BIT2, BIT3, BIT4, BIT5, BIT6, BIT7, BIT8};

Then you avoid the overhead of the bit shifting and you can index your bits, turning the previous code into:

eightBits &= (bits[3] & bits[4]);

Alternatively, if you can use C++, you could just use an std::vector<bool> which is internally defined as a vector of bits, complete with direct indexing.

David
Using `std::vector<bool>` won't get him optimal performance, since he'll end up having two lookups to get one pair of bits. Whether this penalty is sufficient to justify creating his own variation of `std::vector<bool>` is dependent upon whether the lookups (and assignments) themselves are a bottleneck.
Brian
Assuming C++ were an option (the OP only mentioned C) I'd not hesitate to start off with an `std::vector<bool>`, simply for conciseness and readability. If I then needed better performance, I'd profile to find out where the bottleneck was. (It could very well be in rand() and not the vector lookup).
David
Instead of `char bits[8] = { ... };` you could do `#define bits(x) BIT##x`.
Chris Lutz
Eddy
I need to create a very large array, with more than 'max_size of int' boolean values/bits. Is this possible with vector<bool> or bitset?
Eddy
A: 

It's a trade-off:

(1) use 1 byte for each 2 bit value - simple, fast, but uses 4x memory

(2) pack bits into bytes - more complex, some performance overhead, uses minimum memory

If you have enough memory available then go for (1), otherwise consider (2).

Paul R
@Paul: No, it uses 4x as much memory, since he would be storing 2bit numbers in 1 byte. However, I think from the OP's question that he has already made a decision to go with (2).
Brian
@Brian: Thanks - I missed that part - I'll update my answer accordingly.
Paul R
A: 

Useful links to have around when dealing with bits:

Bit Twiddling hacks
Using unions with bitfields

Martin Beckett
+1  A: 
typedef unsigned long bfield_t[ size_needed/sizeof(long) ];
// long because that's probably what your cpu is best at
// The size_needed should be evenly divisable by sizeof(long) or
// you could (sizeof(long)-1+size_needed)/sizeof(long) to force it to round up

Now, each long in a bfield_t can hold sizeof(long)*8 bits.

You can calculate the index of a needed big by:

bindex = index / (8 * sizeof(long) );

and your bit number by

b = index % (8 * sizeof(long) );

You can then look up the long you need and then mask out the bit you need from it.

result = my_field[bindex] & (1<<b);

or

result = 1 & (my_field[bindex]>>b); // if you prefer them to be in bit0

The first one may be faster on some cpus or may save you shifting back up of you need to perform operations between the same bit in multiple bit arrays. It also mirrors the setting and clearing of a bit in the field more closely than the second implemention. set:

my_field[bindex] |= 1<<b;

clear:

my_field[bindex] &= 1<<b;

You should remember that you can use bitwise operations on the longs that hold the fields and that's the same as the operations on the individual bits.

You'll probably also want to look into the ffs, fls, ffc, and flc functions if available. ffs should always be avaiable in . It's there just for this purpose -- a string of bits. Anyway, it is find first set and essentially:

int ffs(int x) {
    int c = 0;
    while (!(x&1) ) {
        c++;
        x>>=1;
    }
    return c; // except that it handles x = 0 differently
}

This is a common operation for processors to have an instruction for and your compiler will probably generate that instruction rather than calling a function like the one I wrote. x86 has an instruction for this, by the way. Oh, and ffsl and ffsll are the same function except take long and long long, respectively.

nategoose