ansaurus

Question

Efficient way to handle adding and removing items by bitwise And

Answer 1

+1 A:

Provided the length of your bitfields is limited, the following may work:

First, represent the bitfields that are in the set as an array of booleans, so in your case (4 bit bitfields), new bool[16];
Transform this array of booleans into a bitfield itself, so a 16-bit bitfield in this case, where each bit represents whether the bitfield corresponding to its index is included

Then operations become:

Remove(0, 0) = and with bitmask 1010101010101010
Remove(1, 0) = and with bitmask 0101010101010101
Remove(0, 2) = and with bitmask 1111000011110000

Note that more complicated 'add/remove' operations could then also be added as O(1) bit-logic.

The only down-side is that extra work is needed to interpret the resulting 16-bit bitfield back into a set of values, but with lookup arrays that might not turn out too bad either.

Addendum:
Additional down-sides:

Once the size of an integer is exceeded, every added bit to the original bit-fields will double the storage space. However, this is not much worse than a typical scenario using another collection where you have to store on average half the possible bitmask values (provided the typical scenario doesn't store far less remaining values).
Once the size of an integer is exceeded, every added bit also doubles the number of 'and' operations needed to implement the logic.

So basically, I'd say if your original bitfields are not much larger than a byte, you are likely better off with this encoding, beyond that you're probably better off with the original strategy.

Further addendum:
If you only ever execute Remove operations, which over time thins out the set state-space further and further, you may be able to stretch this approach a bit further (no pun intended) by making a more clever abstraction that somehow only keeps track of the int values that are non-zero. Detecting zero values may not be as expensive as it sounds either if the JIT knows what it's doing, because a CPU 'and' operation typically sets the 'zero' flag if the result is zero.

As with all performance optimizations, this one'd need some measurement to determine if it is worthwile.

jerryjvl 2009-06-02 03:14:29

Also note that you can extend this mechanism by using multiple ints if you need more than 32/64 combinations. There may be a nice class abstraction to be built on top of this.

jerryjvl 2009-06-02 03:26:30

But the length of your bitmask is the same order as number of items in the collection. So, the 'and' operation is still O(N).

Igor Krivokon 2009-06-02 03:32:33

No it's not... the CPU can perform an integer AND in a single clock (in most/all modern processors, and still a constant number of clocks on older processors).

jerryjvl 2009-06-02 03:36:06

Hmmm. This is an interesting solution, and I can imagine it being much faster than using a linked-list type solution as I suggest in my question. It's definitely cheating, but there's nothing wrong with that :)

Brian 2009-06-02 03:49:36

@jerryjvl/Igor: A CPU can perform an integer AND in a single clock on a specific number of bits (that number varies based on chip, but I imagine a 32 bit chip can do it with 32 bits). Hence my remark that it is cheating.

Brian 2009-06-02 03:50:53

@Brian: The nature of your question and its conditions really only leaves 'cheating' as an option, since essentially your problem as defined is linear in either the stored number of values or the number of possible values... in which case what you already have is as-good-as-it-gets.

jerryjvl 2009-06-02 03:56:41

@jerryjvl: but in common case the size of collection is greater than 32 (or 64). If N is so small, the question about complexity is misleading.

Igor Krivokon 2009-06-02 04:00:41

One possible improvement to this is to just maintain a list of removed elements (using a pair of bitfields). When iterating over the list, one can just skip items based on bitwise operations. This works basically the same as yours, but makes removal O(1) in exchange for slowing down lookup and iteration by a constant factor compared to your algorithm. This method has the advantage of needing less space, assuming your set is not filled.

Brian 2009-06-02 04:02:28

Ah, I just made a further addition that'd probably improve my solution as well... I think the first step regardless is to put all this behind a class abstraction so that it is easy to measure the trade-offs between different implementations.

jerryjvl 2009-06-02 04:04:59

In response to comments about cheating, I analyzed the tree method I proposed further. It's actually O(n / logn), which is sublinear in the average case. Yeah, it's still not really that much better, and it adds enough overhead that these performance benefits are overshadowed until n is large enough that there are bigger problems..

Brian 2009-06-02 04:10:51

Answer 2

A:

If each decision bit and position are listed as objects, {bit value, k-th position}, you would end up with an array of length 2*k. If you link to each of these array positions from your item, represented as a linked list (which are of length k), using a pointer to the {bit, position} object as the node value, you can "invalidate" a bunch of items by simply deleting the {bit, position} object. This would require you, upon searching the list of items, to find "complete" items (it makes search REALLY slow?).

So something like: [{0,0}, {1,0}, {0,1}, {1, 1}, {0,2}, {1, 2}, {0,3}, {1,3}]

and linked from "0100", represented as: {0->3->4->6}

You wouldn't know which items were invalid until you tried to find them (so it doesn't really limit your search space, which is what you're after).

Oh well, I tried.

Jeff Meatball Yang 2009-06-02 04:14:28

I'm not sure what you're trying to suggest. Just keeping a pair of bitfields representing which items were removed will accomplish fast removal with no knowledge of what has been removed until you try to look it up).

Brian 2009-06-02 04:18:21

Answer 3

A:

Sure, it is possible (even if this is "cheating"). Just keep a stack of Remove objects:

struct Remove {
    bool set;
    int index;
}

The remove function just pushes an object on the stack. Viola, O(1).

If you wanted to get fancy, your stack couldn't exceed (number of bits) without containing duplicate or impossible scenarios.

The rest of the collection has to apply the logic whenever things are withdrawn or iterated over.

Two ways to do insert into the collection:

Apply the Remove rules upon insert, to clear out the stack, making in O(n). Gotta pay somewhere.
Each bitfield has to store it's index in the remove stack, to know what rules apply to it. Then, the stack size limit above wouldn't matter

Todd Gardner 2009-06-02 04:20:03

I've already suggested this, though in my version you just keep two bitfields and test against objects during iteration. That has the advantage of making it cost less when you're forced to actually pay the piper...unless the bitfield is long and there aren't a lot of removals. In truth, I suspect a real implementation probably would use that kind of solution.

Brian 2009-06-02 04:29:11

That would lock you into a linear insert, while the above could do a O(1) insert, O(1) remove, and a O(n) iterate over (assuming constant bitfield size)

Todd Gardner 2009-06-02 04:54:20

@Todd: After responding with why it didn't lock you into linear insert, I realized that I was relying on the rather odd assumption that no inserts will ever occur after a remove operation...which is kind of a stupid assumption, though in a practical application it may be true.

Brian 2009-06-02 05:44:12

Answer 4

A:

If you use an array to store your binary tree, you can quickly index any element (the children of the node at index n are at index (n+1)2 and (n+1)2-1. All the nodes at a given level are stored sequentially. The first node at at level x is 2^x-1 and there are 2^x elements at that level.

Unfortunately, I don't think this really gets you much of anywhere from a complexity standpoint. Removing all the left nodes at a level is O(n/2) worst case, which is of course O(n). Of course the actual work depends on which bit you are checking, so the average may be somewhat better. This also requires O(2^n) memory which is much worse than the linked list and not practical at all.

I think what this problem is really asking is for a way to efficiently partition a set of sets into two sets. Using a bitset to describe the set gives you a fast check for membership, but doesn't seem to lend itself to making the problem any easier.

Dolphin 2009-06-02 19:42:24

ansaurus

tags:

views:

answers:

Efficient way to handle adding and removing items by bitwise And

related questions