views:

8853

answers:

10

I have heard mixed opinions over the amount of memory that a byte takes up in a java program.

I am aware you can store no more than +127 in a java byte, and the documentation says that a byte is only 8 bits but here I am told that it actually takes up the same amount of memory as an int, and therefore is just a Type that helps in code comprehension and not efficiency.

Can anyone clear this up, and would this be an implementation specific issue?

+5  A: 

Java is never implementation or platform specific (at least as far as primitive type sizes are concerned). They primitive types are always guaranteed to stay the same no matter what platform you're on. This differs from (and was considered an improvement on) C and C++, where some of the primitive types were platform specific.

Since it's faster for the underlying operating system to address four (or eight, in a 64-bit system) bytes at a time, the JVM may allocate more bytes to store a primitive byte, but you can still only store values from -128 to 127 in it.

Bill the Lizard
Even if it uses 4 bytes to store a byte, an array of bytes would probably be packed. I would be surprised if a byte[4] used 16 bytes instead of 4 bytes.
Kip
Probably. That *would* be implementation specific. I honestly don't know which method would be faster.
Bill the Lizard
the article is correct, but the comment is wrong. a single byte variable consumes 1 byte +aligment. 8 Byte variabels on a Sun JVM for example cost 8 bytes
kohlerm
+1  A: 

What you've been told is exactly right. The Java byte code specification only has 4-byte types and 8-byte types.

byte, char, int, short, boolean, float are all stored in 4 bytes each.

double and long are stored in 8 bytes.

However byte code is only half the story. There's also the JVM, which is implementation-specific. There's enough info in Java byte code to determine that a variable was declared as a byte. A JVM implementor may decide to use only a byte, although I think that is highly unlikely.

Steve McLeod
Hmm... that seems to go against http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html#31446 : "The values of the integral types of the Java virtual machine are the same as those for the integral types of the Java programming language (§2.4.1)" (Looking for bytecode stuff now...)
Jon Skeet
Got something: http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html#7565 - bipush, baload and bastore appear to work on the byte type... *arithmetic* is only done on ints/longs, but that's a different matter.
Jon Skeet
Jon, I described the byte code spec. That's different to the JVM spec.
Steve McLeod
Does the byte code spec not include bastore/baload/bipush then?
Jon Skeet
Actually it also has arrays and byte arrays are in fact byte arrays and there every byte really is a byte
Mecki
Yes it does. But the Java stack is defined as a series of 4-byte slots. Pushing onto the stack always uses one (for 4-byte types) or two (for 8-byte types) elements. bipush will use one slot.
Steve McLeod
And the JVM certainly knows when a field is a byte field rather than an int field, doesn't it? It may choose not to pack them tightly, but surely that's an implementation decision.
Jon Skeet
And the JVM is a stack-based VM. But I'm guessing you knew that!
Steve McLeod
I learnt this stuff in detail when I wrote a java decompiler. Illuminating stuff as to what really goes on in Java byte code.
Steve McLeod
Even if the Java *stack* is int-based, that doesn't mean its object layout has to be. I'm working up a benchmark...
Jon Skeet
+2  A: 

It depends on how the JVM applies padding etc. An array of bytes will (in any sane system) be packed into 1-byte-per-element, but a class with four byte fields could either be tightly packed or padded onto word boundaries - it's implementation dependent.

Jon Skeet
Does this mean that using a byte alone will not save memory, but if i were to use more than one byte variable (or an array of bytes) i could save significant memory.(I.e. A byte[10][10] <i>could/should</i> take less memory than a int[10][10])
Ben Page
Potentially :) (Certainly I'd expect a byte array to take up less space than an int array - but four byte variables vs four int variables? Don't know.)
Jon Skeet
(See my other answer for evidence that at least some JVMs do packing.)
Jon Skeet
+1  A: 

You could always use longs and pack the data in yourself to increase efficiency. Then you can always gaurentee you'll be using all 4 bytes.

widgisoft
or even all 8 bytes, in a long :)
JeeBee
if you're actually considering this type of memory management, I think you should probably be using C++ or some other language that lets you do the memory management yourself. You'll lose far more in the overhead of the JVM than you'll save through tricks like this in Java.
rmeador
Ah. In C/C++ on 32bit systems int and long are both 32bit or 4 bytes; I forget that long is actually a long on other systems - always made me laugh when they added "longlong" to indicate an 8byte long... ah well.
widgisoft
you can gain performance because you can with ints you can handle 4 bytes at once, not because you save memory (at lost usually) You don't need to pack byte[]'s. you need to avoid single byte fields in objects because alignment will increase the memory overhead
kohlerm
+2  A: 

A revealing exercise is to run javap on some code that does simple things with bytes and ints. You'll see bytecodes that expect int parameters operating on bytes, and bytecodes being inserted to co-erce from one to another.

Note though that arrays of bytes are not stored as arrays of 4-byte values, so a 1024-length byte array will use 1k of memory (Ignoring any overheads).

izb
+2  A: 

Yes, a byte variable is in fact 4 bytes in memory. However this doesn't hold true for arrays. A byte array of 20 bytes is in fact only 20 bytes in memory. That is because the Java Bytecode Language only knows ints and longs as number types (so it must handle all numbers as either type of both, 4 bytes or 8 bytes), but it knows arrays with every possible number size (so short arrays are in fact two bytes per entry and byte arrays are in fact one byte per entry).

Mecki
oh yeah, I forgot that not-so-little detail!
Steve McLeod
Don't forget that a byte array also has the normal overheads of being an object, and the length. Oh, and your variable is then a reference (4 or 8 bytes). So to actually have 20 bytes available and useful will require 36 bytes, assuming no aliasing. I'd stick to 20 byte fields :)
Jon Skeet
@Jon @Mecki Can you give more or less exact formula to compute the size of `int[]` array? Will it be `4[=length] + 4[=int_size]*length(array) + 8_byte_align`?
dma_k
@dma_k: There is no formula because it solely depends on the virtual machine. An array is more or less an object in Java. An object might have 20 internal variables, necessary for VM management only, or it might have none of these. There is more than just Sun's VM on this planet (a lot more). An int[] array will for sure be at least "4 * length(array)" and has some static overhead. Overhead can be anything, from 4 byte to xxx byte; overhead does not depend on array size (int[1] has the same static overhead as int[10000000]); thus overhead is insignificant for big arrays.
Mecki
@Mecki I found this link in yet another thread; it satisfied my curiosity: http://kohlerm.blogspot.com/2008/12/how-much-memory-is-used-by-my-java.html
dma_k
@dma_k: Please note: Those are only valid for SUNs JVM, as the blog entry also states (and SUN is now Oracle, BTW; and the Sun JVM exists only for Windows, Linux and Solaris); Not necessarily for any other JVM that exists. Further SUN may feel to change that with every new release (e.g. Java 1.7/1.8 might have totally different values). If you want to know for sure, test yourself (create lots of arrays, measure memory consumption of JVM) - if the JVM exists in source, look at the source and you'll have absolute correct values.
Mecki
+11  A: 

Okay, there's been a lot of discussion and not a lot of code :)

Here's a quick benchmark. It's got the normal caveats when it comes to this kind of thing - testing memory has oddities due to JITting etc, but with suitably large numbers it's useful anyway. It has two types, each with 80 members - LotsOfBytes has 80 bytes, LotsOfInts has 80 ints. We build lots of them, make sure they're not GC'd, and check memory usage:

class LotsOfBytes
{
    byte a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, aa, ab, ac, ad, ae, af;
    byte b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf;
    byte c0, c1, c2, c3, c4, c5, c6, c7, c8, c9, ca, cb, cc, cd, ce, cf;
    byte d0, d1, d2, d3, d4, d5, d6, d7, d8, d9, da, db, dc, dd, de, df;
    byte e0, e1, e2, e3, e4, e5, e6, e7, e8, e9, ea, eb, ec, ed, ee, ef;
}

class LotsOfInts
{
    int a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, aa, ab, ac, ad, ae, af;
    int b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf;
    int c0, c1, c2, c3, c4, c5, c6, c7, c8, c9, ca, cb, cc, cd, ce, cf;
    int d0, d1, d2, d3, d4, d5, d6, d7, d8, d9, da, db, dc, dd, de, df;
    int e0, e1, e2, e3, e4, e5, e6, e7, e8, e9, ea, eb, ec, ed, ee, ef;
}


public class Test
{
    private static final int SIZE = 1000000;

    public static void main(String[] args) throws Exception
    {        
        LotsOfBytes[] first = new LotsOfBytes[SIZE];
        LotsOfInts[] second = new LotsOfInts[SIZE];

        System.gc();
        long startMem = getMemory();

        for (int i=0; i < SIZE; i++)
        {
            first[i] = new LotsOfBytes();
        }

        System.gc();
        long endMem = getMemory();

        System.out.println ("Size for LotsOfBytes: " + (endMem-startMem));
        System.out.println ("Average size: " + ((endMem-startMem) / ((double)SIZE)));

        System.gc();
        startMem = getMemory();
        for (int i=0; i < SIZE; i++)
        {
            second[i] = new LotsOfInts();
        }
        System.gc();
        endMem = getMemory();

        System.out.println ("Size for LotsOfInts: " + (endMem-startMem));
        System.out.println ("Average size: " + ((endMem-startMem) / ((double)SIZE)));

        // Make sure nothing gets collected
        long total = 0;
        for (int i=0; i < SIZE; i++)
        {
            total += first[i].a0 + second[i].a0;
        }
        System.out.println(total);
    }

    private static long getMemory()
    {
        Runtime runtime = Runtime.getRuntime();
        return runtime.totalMemory() - runtime.freeMemory();
    }
}

Output on my box:

Size for LotsOfBytes: 88811688
Average size: 88.811688
Size for LotsOfInts: 327076360
Average size: 327.07636
0

So obviously there's some overhead - 8 bytes by the looks of it, although somehow only 7 for LotsOfInts (? like I said, there are oddities here) - but the point is that the byte fields appear to be packed in for LotsOfBytes such that it takes (after overhead removal) only a quarter as much memory as LotsOfInts.

Jon Skeet
it depends on the JVM. Sun aligns to 8 Byte boundaries
kohlerm
@kohlerm: That was with a Sun JVM.
Jon Skeet
A: 

byte = 8bit =one byte defined by the Java Spec.

how much memory an byte array needs is not defined by the Spec, nor is defined how much a complex objects needs.

For the Sun JVM I documented the rules

here

Regards,

Markus

kohlerm
A: 

See my MonitoringTools at my site (www.csd.uoc.gr/~andreou)

class X {
   byte b1, b2, b3...;
}

long memoryUsed = MemoryMeasurer.measure(new X());

(It can be used for more complex objects/object graphs too)

In Sun's 1.6 JDK, it seems that a byte indeed takes a single byte (in older versions, int ~ byte in terms of memory). But note that even in older versions, byte[] were also packed to one byte per entry.

Anyway, the point is that there is no need for complex tests like Jon Skeet's above, that only give estimations. We can directly measure the size of an object!

A: 

Reading through the above comments, it seems that my conclusion will come as a surprise to many (it is also a surprise to me), so it worths repeating:

  • The old size(int) == size(byte) for variables holds no more, at least in Sun's Java 6.

Instead, size(byte) == 1 byte (!!)