A: 

Just wondering why can't you use java.util.BitSet;

Basically what you can do, is to read the whole data as byte[], convert it to binary in string format and use string utilities like .substring() to do the work. This will also work bit sequences > 16.

Lets say you have 3 bytes: 1, 2, 3 and you want to extract bit sequence from 5th to 16th bit.

Number Binary

1      00000001
2      00000010
3      00000011

Code example:

public static String getRealBinary(byte[] input){
    StringBuilder sb = new StringBuilder();

    for (byte c : input) {
        for (int n =  128; n > 0; n >>= 1){
            if ((c & n) == 0)
                sb.append('0');
            else sb.append('1');
        }
    }

    return sb.toString();
}
public static void main(String[] args) {
    byte bytes[] = new byte[]{1,2,3};
    String sbytes = getRealBinary(bytes);
    System.out.println(sbytes);
    System.out.println(sbytes.substring(5,16));
}

Output:

000000010000001000000011
00100000010

Speed:

I did a testrun for 1m times and on my computer it took 0.995s, so its reasonably very fast:

Code to repeat the test yourself:

public static void main(String[] args) {
    Random r = new Random();
    byte bytes[] = new byte[4];
    long start, time, total=0;

    for (int i = 0; i < 1000000; i++) {
        r.nextBytes(bytes);
        start = System.currentTimeMillis();
        getRealBinary(bytes).substring(5,16);
        time = System.currentTimeMillis() - start;
        total+=time;
    }
    System.out.println("It took " +total + "ms");
}
Margus
BitSet doesn't offer a method to get consecutive bits as int, I would still have to implement a getBits() of my own, still managing my own bitGet index and calling BitSet.get(index) for every single bit - that contradicts my definition of efficient. I'm intrested in speed, not flexibility here.
Durandal
Updated my example.
Margus
Sorry if my question was not clear enough, I'm looking to extract into integers, no other fancy transformations.
Durandal
+1  A: 

If you just want the unsigned bit sequence as an int.

static final int[] lookup = {0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, 0x1FF, 0x3FF, 0x7FF, 0xFFF, 0x1FFF, 0x3FFF, 0x7FFF, 0xFFFF };

/*
 * bytes: byte array, with the bits indexed from 0 (MSB) to (bytes.length * 8 - 1) (LSB)
 * offset: index of the MSB of the bit sequence.
 * len: length of bit sequence, must from range [0,16].
 * Not checked for overflow
 */
static int getBitSeqAsInt(byte[] bytes, int offset, int len){

    int byteIndex = offset / 8;
    int bitIndex = offset % 8;
    int val;

    if ((bitIndex + len) > 16) {
        val = ((bytes[byteIndex] << 16 | bytes[byteIndex + 1] << 8 | bytes[byteIndex + 2]) >> (24 - bitIndex - len)) & lookup[len];
    } else if ((offset + len) > 8) {
        val = ((bytes[byteIndex] << 8 | bytes[byteIndex + 1]) >> (16 - bitIndex - len)) & lookup[len];
    } else {
        val = (bytes[byteIndex] >> (8 - offset - len)) & lookup[len];
    }

    return val;
}

If you want it as a String (modification of Margus' answer).

static String getBitSequence(byte[] bytes, int offset, int len){

    int byteIndex = offset / 8;
    int bitIndex = offset % 8;
    int count = 0;
    StringBuilder result = new StringBuilder();        

    outer:
    for(int i = byteIndex; i < bytes.length; ++i) {
        for(int j = (1 << (7 - bitIndex)); j > 0; j >>= 1) {
            if(count == len) {
                break outer;
            }                
            if((bytes[byteIndex] & j) == 0) {
                result.append('0');
            } else {
                result.append('1');
            }
            ++count;
        }
        bitIndex = 0;
    }
    return  result.toString();
}   
blizpasta
Your solution looks very similar to mine, only you get the mask for the part allocated in the int from a lookup table while I build it on the fly "(1 << count) - 1". As far as I can tell the optimization to look if the sequence covers 1, 2 or 3 bytes is actually slowing the code - removing that its practically equal to my code.
Durandal
Included your version into my microbenchmark (there was some byte masking missing which I took the libery of adding). The performance differences are... astounding.
Durandal
A: 

Well, depending on how far you want to go down the time vs. memory see-saw, you can allocate a side table of every 32-bits at every 16-bit offset and then do a mask and shift based on the 16-bit offset:

byte[] bytes = new byte[2048];   
int bitGet;   
unsigned int dwords[] = new unsigned int[2046];

public BitArray() {   
    for (int i=0; i<bytes.length; ++i) {   
        bytes[i] = (byte) i;   
    }   

    for (int i= 0; i<dwords.length; ++i) {
        dwords[i]= 
            (bytes[i    ] << 24) | 
            (bytes[i + 1] << 16) | 
            (bytes[i + 2] <<  8) | 
            (bytes[i + 3]);
    }
}   

int getBits(int len)
{
    int offset= bitGet;
    int offset_index= offset>>4;
    int offset_offset= offset & 15;

    return (dwords[offset_index] >> offset_offset) & ((1 << len) - 1);
}

You avoid the branching (at the cost of quadrupling your memory footprint). And is looking up the mask really that much faster than (1 << len) - 1?

MSN
Durandal
Looking up the mask from a table seems to be a little *slower* than (1 << len) - 1. Replacing the lookup in blitzpastas solution speeds it up by ~3%. I also played with your code a little more. It turns out that you can get away with half as much extra buffer (since getBits is constrainted to fetch 16 bits max). For every odd dword in the buffer one can substitute the following even dword and simply shift 8 bits less. Only requires a slight modification to the buffer preparation and different shift/mask values in getBits, so it runs at almost exactly the same speed.
Durandal
A: 

You want at most 16 bits, taken from an array of bytes. 16 bits can span at most 3 bytes. Here's a possible solution:

    int GetBits(int bit_index, int bit_length) {
          int byte_offset = bit_index >> 3;
          return ((((((byte_array[byte_offset]<<8)
                    +byte_array[byte_offset+1])<<8)
                    +byte_array[byte_offset+2]))
                   >>(24-(bit_index&7)+bit_length))))
                  &((1<<bit_length)-1);
         }

[Untested]

If you call this a lot you should precompute the 24-bit values for the 3 concatenated bytes, and store those into an int array.

I'll observe that if you are coding this in C on an x86, you don't even need to precompute the 24 bit array; simply access the by te array at the desire offset as a 32 bit value. The x86 will do unaligned fetches just fine. [commenter noted that endianess mucks this up, so it isn't an answer, OK, do the 24 bit version.]

Ira Baxter
You seem to have missed the Java tag :) If I'm not mistaken, fetching simply a dword on x86 wouldn't be all that simple, since its a little endian architecture. There are two typos in the code example (first shift should be 16 and the third array access is missing a +2 to the index). Adding that its just a condensation of whats in the question under Version1. Oh and in Java bytes are signed and silently promoted to int, so the 2nd and 3rd byte need to be masked with 0xFF.
Durandal
First shift is 8 bits, then 2nd byte added, and that pair is shifted left 8. If your machine has a barrel shifter, this won't matter, but if it shifts short distances faster than long it matters. Yes, dropped the +2, edited to correct. If bytes are promoted as signed (stupid Java) you can alway just copy them to "ints" before you start to avoid the sign masking junk, and that should make it competitive. Now that I've scrolled version 1 right a few hundred characters (wasnt easy where the scroll bar was) I see you did the same sort of thing. The 7 line answer got lost in your huge example :-{
Ira Baxter
+1 for reminding me that shift distance may make a difference on some machines. The example was once small, but the way things developed ... well it ended up this way. Unfortunately Java can not perform *any* unsigned casts to a wider type, that leaves us Java guys with no choice other than masking :(
Durandal
Java might insist on signed casts, but if you have an 8 bit value stored in an int (maybe you got that masking it!) you don't need to mask the value fetched from the int. However, I think the 24-bits-precomputed is by far your best solution.
Ira Baxter