tags:

views:

5270

answers:

9

For example, let's say I have an application that can read in a CSV file with piles of data rows. I give the user a summary of the number of rows based on types of data, but I want to make sure that I don't read in too many rows of data and cause OutOfMemory Exceptions. Each row translates into an object. Is there an easy way to find out the size of that object programmatically? Is there a reference that defines how large primitive types and object references are for a VM?

Right now, I have code that says read up to 32,000 rows, but I'd also like to have code that says read as many rows as possible until I've used 32MB of memory. Maybe that is a different question, but I'd still like to know.

A: 

I doubt you want to do it programmatically unless you just want to do it once and store it for future use. It's a costly thing to do. There's no sizeof() operator in Java, and even if there was, it would only count the cost of the references to other objects and the size of the primitives.

One way you could do it is to serialize the thing to a File and look at the size of the file, like this:

Serializable myObject;
ObjectOutputStream oos = new ObjectOutputStream (new FileOutputStream ("obj.ser"));
oos.write (myObject);
oos.close ();

Of course, this assumes that each object is distinct and doesn't contain non-transient references to anything else.

Another strategy would be to take each object and examine its members by reflection and add up the sizes (boolean & byte = 1 byte, short & char = 2 bytes, etc.), working your way down the membership hierarchy. But that's tedious and expensive and ends up doing the same thing the serialization strategy would do.

jodonnell
I'd serialize it to a byte[] using a ByteArrayOutputStream. It would be a lot faster than writing it out to a file.
ScArcher2
+1  A: 

There isn't a method call, if that's what you're asking for. With a little research, I suppose you could write your own. A particular instance has a fixed sized derived from the number of references and primitive values plus instance bookkeeping data. You would simply walk the object graph. The less varied the row types, the easier.

If that's too slow or just more trouble than it's worth, there's always good old-fashioned row counting rule-of-thumbs.

sblundy
+3  A: 

If you would just like to know how much memory is being used in your JVM, and how much is free, you could try something like this:

// Get current size of heap in bytes
long heapSize = Runtime.getRuntime().totalMemory();

// Get maximum size of heap in bytes. The heap cannot grow beyond this size.
// Any attempt will result in an OutOfMemoryException.
long heapMaxSize = Runtime.getRuntime().maxMemory();

// Get amount of free memory within the heap in bytes. This size will increase
// after garbage collection and decrease as new objects are created.
long heapFreeSize = Runtime.getRuntime().freeMemory();

I found this here.

edit: I thought this might be helpful as the question author also stated he would like to have logic that handles "read as many rows as possible until I've used 32MB of memory."

matt b
This is not a good solution, as you never know when a garbage collect will happen, or how much extra memory will be allocated to the heap at once.
Nick Fortescue
That is true, and I wouldn't intend this to address the main question of this post, but it might help him to know programmatically when he is getting somewhat close to hitting the max heap size.
matt b
I had thought about this. Thanks for the suggestion.
Jay R.
+5  A: 

You have to walk the objects using reflection. Be careful as you do:

  • Just allocating an object has some overhead in the JVM. The amount varies by JVM so you might make this value a parameter. At least make it a constant (8 bytes?) and apply to anything allocated.
  • Just because byte is theoretically 1 byte doesn't mean it takes just one in memory.
  • There will be loops in object references, so you'll need to keep a HashMap or somesuch using object-equals as the comparator to eliminate infinite loops.

@jodonnell: I like the simplicity of your solution, but many objects aren't Serializable (so this would throw an exception), fields can be transient, and objects can override the standard methods.

Jason Cohen
Aren't the sizes of various primitives defined in the Java Specification? (§2.4.1)
erickson
Not in the sense of "how much memory does it occupy," which is the question. Only in the sense of how they operate.For example, bytes, chars, and shorts take up an entire word on the Java stack, even though they operate with rounding etc..
Jason Cohen
This sounds similar to measuring the size, as shown by Heinz in his Newsletter #78: http://www.javaspecialists.eu/archive/Issue078.html. I used it. His approach works.
Peter Kofler
+1  A: 

You have to measure it with a tool, or estimate it by hand, and it depends on the JVM you are using.

There is some fixed overhead per object. It's JVM-specific, but I usually estimate 40 bytes. Then you have to look at the members of the class. Object references are 4 (8) bytes in a 32-bit (64-bit) JVM. Primitive types are:

  • boolean and byte: 1 byte
  • char and short: 2 bytes
  • int and float: 4 bytes
  • long and double: 4 bytes

Arrays follow the same rules; that is, it's an object reference so that takes 4 (or 8) bytes in your object, and then its length multiplied by the size of its element.

Trying to do it programmatically with calls to Runtime.freeMemory() just doesn't give you much accuracy, because of asynchronous calls to the garbage collector, etc. Profiling the heap with -Xrunhprof or other tools will give you the most accurate results.

erickson
@erickson I wouldn't be sure about sizeof(boolean)==1 looking at this thread (http://stackoverflow.com/questions/1907318/java-boolean-primitive-type-size). Can you please comment on this?
dma_k
+1  A: 

Firstly "the size of an object" isn't a well-defined concept in Java. You could mean the object itself, with just its members, the Object and all objects it refers to (the reference graph). You could mean the size in memory or the size on disk. And the JVM is allowed to optimise things like Strings.

So the only correct way is to ask the JVM, which a good profiler (I use YourKit), which probably isn't what you want.

However, from the description above it sounds like each row will be self-contained, and not have a big dependency tree, so the serialization method will probably be a good approximation on most JVMs. The easiest way to do this is as follows:

 Serializable ser;
 ByteArrayOutputStream baos = new ByteArrayOutputStream();
 ObjectOutputStream oos = new ObjectOutputStream(baos);
 oos.writeObject(ser);
 oos.close();
 return baos.size();

Remember that if you have objects with common references this will not give the correct result, and size of serialization will not always match size in memory, but it is a good approximation. The code will be a bit more efficient if you initialise the ByteArrayOutputStream size to a sensible value.

Nick Fortescue
I like this approach. How far off in terms of object size have you been off.
Berlin Brown
+6  A: 

Some years back Javaworld had an article on determining the size of composite and potentially nested Java objects, they basically walk through creating a sizeof() implementation in Java. The approach basically builds on other work where people experimentally identified the size of primitives and typical Java objects and then apply that knowledge to a method that recursively walks an object graph to tally the total size.

It is always going to be somewhat less accurate than a native C implementation simply because of the things going on behind the scenes of a class but it should be a good indicator.

Alternatively a SourceForge project appropriately called sizeof that offers a Java5 library with a sizeof() implementation.

P.S. Do not use the serialization approach, there is no correlation between the size of a serialized object and the amount of memory it consumes when live.

Boris Terzic
The sizeof utility is probably the fastest way. It's basically what Stefan said, but already packed in a jar ready to use.
Alexandre L Telles
+23  A: 

You can use the java.lang.instrumentation package:
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/instrument/Instrumentation.html

Compile and put this class in a JAR:

import java.lang.instrument.Instrumentation;

public class ObjectSizeFetcher {
    private static Instrumentation instrumentation;

    public static void premain(String args, Instrumentation inst) {
        instrumentation = inst;
    }

    public static long getObjectSize(Object o) {
        return instrumentation.getObjectSize(o);
    }
}

Add the following to your MANIFEST.MF:

Premain-Class: ObjectSizeFetcher

Use getObjectSize:

public class C {
    private int x;
    private int y;

    public static void main(String [] args) {
        System.out.println(ObjectSizeFetcher.getObjectSize(new C()));
    }
}

Invoke with:

java -javaagent:ObjectSizeFetcherAgent.jar C
Stefan Karlsson
@Stefan Nice hint! Can you please tell, what will be the size of `byte[0]`, `byte[1]`, `byte[5]`, `int[0]`, `int[1]`, `int[2]` using the approach you described? It would be nice, if results include overhead for length of array and memory alignment.
dma_k
A: 

A good generic solution is to use heap size delta. This involves minimal effort and is re-usable between any type of object / object graph. By instantiating and destroying your objects many times and garbage collecting in between, and then taking the average, you avoid compiler and JVM optimizations that alter results and get a fairly accurate result. If you need an EXACT answer down to the byte then this may not be the solution for you, but for all practical applications that I know of (profiling, memory requirement calcualtions) it works extremely well. The code below will do just that.

 public class Sizeof {
   public static void main(String[] args)
    throws Exception {
  // "warm up" all classes/methods that we are going to use:
  runGC();
  usedMemory();

  // array to keep strong references to allocated objects:
  final int count = 10000; // 10000 or so is enough for small ojects
  Object[] objects = new Object[count];

  long heap1 = 0;

  // allocate count+1 objects, discard the first one:
  for (int i = -1; i < count; ++i) {
    Object object;

 //// INSTANTIATE YOUR DATA HERE AND ASSIGN IT TO 'object':


    object=YOUR OBJECT;
 ////end your code here
    if (i >= 0) {
   objects[i] = object;
    }
    else {
   object = null; // discard the "warmup" object
   runGC();
   heap1 = usedMemory(); // take a "before" heap snapshot
    }
  }

  runGC();
  long heap2 = usedMemory(); // take an "after" heap snapshot:

  final int size = Math.round(((float)(heap2 - heap1)) / count);
  System.out.println("'before' heap: " + heap1 +
         ", 'after' heap: " + heap2);
  System.out.println("heap delta: " + (heap2 - heap1) +
         ", {" + objects[0].getClass() + "} size = " + size + " bytes");
   }

   // a helper method for creating Strings of desired length
   // and avoiding getting tricked by String interning:
   public static String createString(final int length) {
  final char[] result = new char[length];
  for (int i = 0; i < length; ++i) {
    result[i] = (char)i;
  }

  return new String(result);
   }

   // this is our way of requesting garbage collection to be run:
   // [how aggressive it is depends on the JVM to a large degree, but
   // it is almost always better than a single Runtime.gc() call]
   private static void runGC()
    throws Exception {
  // for whatever reason it helps to call Runtime.gc()
  // using several method calls:
  for (int r = 0; r < 4; ++r) {
    _runGC();
  }
   }

   private static void _runGC()
    throws Exception {
  long usedMem1 = usedMemory(), usedMem2 = Long.MAX_VALUE;

  for (int i = 0; (usedMem1 < usedMem2) && (i < 1000); ++i) {
    s_runtime.runFinalization();
    s_runtime.gc();
    Thread.currentThread().yield();

    usedMem2 = usedMem1;
    usedMem1 = usedMemory();
  }
   }

   private static long usedMemory() {
  return s_runtime.totalMemory() - s_runtime.freeMemory();
   }

   private static final Runtime s_runtime = Runtime.getRuntime();

 } // end of class
Pete
This is in no way deterministic. -1
Yuval A