views:

156

answers:

5

I know the question sounds silly, but consider this: I have an array of ints (1..N) and a labelling algorithm. at any point the item the int represents is in one of three states. The current version holds these states in a byte array, where 0, 1 and 2 represent the three states. alternatively, I could have three arrays of boolean - one for each state. which is better (consumes less memory) depends on how jvm (sun's version) stores the arrays - is a boolean represented by 1 bit? is there any other magic happening behind the scenes? (p.s. don't start with all that "this is not the way OO/Java works" - I know, but here performance comes in front. plus the algorithm is simple and perfectly readable even in such form).

Thanks a lot

+2  A: 

Instead of two booleans or 1 int, just use a BitSet - http://java.sun.com/j2se/1.4.2/docs/api/java/util/BitSet.html

You can then have two bits per label/state. And BitSet being a standard java class, you are likely to get good performance.

sri
He needs 3 states.
Roman
2 bits can store four states.
sri
Indeed: 00, 01, 10, 11
Martijn Courteaux
I've never heard of bitsets. given that boolean array is likely to be byte array behind the scenes, bitset is probably the next best choice memory-wise. although the added object wrapper will probably negatively influence performance more than the saved space is worth. I think I'll just stick with bytes...
jd
@joe_shmoe: that last comment is the kind of premature optimization thinking that Vlad mentioned in the comment to original question.
polygenelubricants
+1  A: 

Theoretically, with 3 boolean arrays you'll need to do:

firstState[n] = false;
secondState[n] = true;
thirdState[n] = false;

every time when you want to change n-th element state. Here you can see 3 taking element by index operations and 3 assignment operations.

With 1 byte array you'll need:

elements[n] = 1;

It's more readable and 3 times faster. And one more advantage of this solution it that you can easily add as many new states as you want (when with boolean arrays you'll need to introduce new arrays).

But I don't think you'll ever see the performance difference.

UPD: actually I'd make it more java way (not looking that you don't find easy ways) and use array of enums. This will make it much more clear and will give you some flexibility (maybe in future you'll decide that oop is not so bad thing):

enum ElementState {
   FIRST, SECOND, THIRD;
}

ElementState[] elementStates = new ElementState[N];
...
elementStates[i] = ElementState.FIRST;
Roman
+1  A: 

The JVM second edition spec (http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html) specifies that boolean arrays are encoded as (0,1), but doesn't specify the type used. So the particular JVM may or may not use bit - it could use int.

However, if performance is paramount, using a single byte would in any case seem to be your best option anyway.

EDIT: I incorrectly said that boolean arrays are stored as bit arrays - this is possible but implementation specific.

Steven Mackenzie
I doubt both your your claims. Can you cite sources?
Joachim Sauer
You were right to doubt one claim ;) As for the other regarding performance - I only said it would 'seem' to be the best option, nothing to cite. I'd be keen to see any evidence to the contrary.
Steven Mackenzie
Ok, fair enough ;-) I don't have claim to the contrary of the performance claim at hand.
Joachim Sauer
Bit arrays are likely to achieve better performance due to reduced cache demands when run on platforms with data cache. Also, the relative cost of getting a bit out of a byte or word is not usually that high when you take into account that many processors will require an operation to test a byte for 0. On a JVM that does JIT and is run on a processor with data cache and has either 1<<x or set_bit(x) operations then the bit array should be much much faster if this data is used often.
nategoose
Interesting! Any reading you can recommend on the subject?
Steven Mackenzie
A: 

If you want a guaranteed minimum you could use three java.util.BitSets. These will only use one bit per flag (though you will have the extra object overhead, that may outweigh the benefits if the number of flags is small.) I would say if you have a large number of objects BitSet may be a better alternative, otherwise an array of byte constants or enums will lead to more readable code (and the extra storage shouldn't be a real concern.)

M. Jessup
Why 3? Two is sufficient. 3 makes it easier to code, but 2 is sufficient.
sri
A: 

The array of bytes is much better!

  1. A boolean uses in every programming language 1 byte! So you will use for every state 3 bytes and you can do this with only 1 byte (in theory you can reduce it to only 1 bit (see other posts).
  2. with a byte array, you can simply change it to the byte you want. With three arrays you have to change the value at every array!
  3. When you are your application developing, it is possible you need an extra state. So, this means you have to create again an array. Plus you have to change 4 values (second point)

So, I hope we persuaded you!

Martijn Courteaux