views:

1072

answers:

10

How do you optimize the heap size usage of an application that has a lot (millions) of long-lived objects? (big cache, loading lots of records from a db)

  • Use the right data type
    • Avoid java.lang.String to represent other data types
  • Avoid duplicated objects
    • Use enums if the values are known in advance
    • Use object pools
    • String.intern() (good idea?)
  • Load/keep only the objects you need

I am looking for general programming or Java specific answers. No funky compiler switch.

Edit:

Optimize the memory representation of a POJO that can appear millions of times in the heap.

Use cases

  • Load a huge csv file in memory (converted into POJOs)
  • Use hibernate to retrieve million of records from a database

Resume of answers:

  • Use flyweight pattern
  • Copy on write
  • Instead of loading 10M objects with 3 properties, is it more efficient to have 3 arrays (or other data structure) of size 10M? (Could be a pain to manipulate data but if you are really short on memory...)
+10  A: 

You don't say what sort of objects you're looking to store, so it's a little difficult to offer detailed advice. However some (not exclusive) approaches, in no particular order, are:

  • Use a flyweight pattern wherever possible.
  • Caching to disc. There are numerous cache solutions for Java.
  • There is some debate as to whether String.intern is a good idea. See here for a question re. String.intern(), and the amount of debate around its suitability.
  • Make use of soft or weak references to store data that you can recreate/reload on demand. See here for how to use soft references with caching techniques.

Knowing more about the internals and lifetime of the objects you're storing would result in a more detailed answer.

Brian Agnew
+15  A: 

I suggest you use a memory profiler, see where the memory is being consumed and optimise that. Without quantitative information you could end up changing thing which either have no effect or actually make things worse.

You could look at changing the representation of your data, esp if your objects are small. For example, you could represent a table of data as a series of columns with object arrays for each column, rather than one object per row. This can save a significant amount of overhead for each object if you don't need to represent an individual row. e.g. a table with 12 columns and 10,000,000 rows could use 12 objects (one per column) rather than 10 million (one per row)

Peter Lawrey
Good trick for minimizing the number of objects.
Boune
I agree that a memory profiler is a good starting point for someone who does not know which Class instances are taking all the memory. The question is more, if I know in advance I will have 10M pojo#1 in memory, how do minimize the consumption of each instance?
Boune
+9  A: 

Ensure good normalization of your object model, don't duplicate values.

Ahem, and, if it's only millions of objects I think I'd just go for a decent 64 bit VM and lots of ram ;)

krosenvold
Which is quite possibly the most cost-effective solution :-)
Brian Agnew
+1 - That's cutting to the heart of the issue.
duffymo
Great answer. Using caches of data and reducing duplicate records and fields is a major saver.
Fortyrunner
How do you minimize the number of duplicated values? Original question mentions usage of Enum, String.intern, object pools. How would you insure that values are not duplicated?
Boune
@Boune There may be combinations (subsets) of values that are duplicate.
krosenvold
+1  A: 

I want to add something to the point Peter alredy made(can't comment on his answer :() it's always better to use a memory profiler(check java memory profiler) than to go by intution.80% of time it's routine that we ignore has some problem in it.also collection classes are more prone to memory leaks.

prateek urmaliya
+4  A: 

Normal "profilers" won't help you much, because you need an overview of all your "live" objects. You need heap dump analyzer. I recommend the Eclipse Memory analyzer.

Check for duplicated objects, starting with Strings. Check whether you can apply patterns like flightweight, copyonwrite, lazy initialization (google will be your friend).

kohlerm
+1  A: 

You could just store fewer objects in memory. :) Use a cache that spills to disk or use Terracotta to cluster your heap (which is virtual) allowing unused parts to be flushed out of memory and transparently faulted back in.

Alex Miller
A: 

A fancy one: keep most data compressed in ram. Only expand the current working set. If your data has good locality that can work nicely.

Use better data structures. The standard collections in java are rather memory intensive.

[what is a better data structure]

  • If you take a look at the source for the collections, you'll see that if you restrict yourself in how you access the collection, you can save space per element.
  • The way the collection handle growing is no good for large collections. Too much copying. For large collections, you need some block-based algorithm, like btree.
Stephan Eggermont
How would you define better data structures? How would you implement that?
Boune
A: 

Spend some time getting acquainted with and tuning the VM command line options, especially those concerning garbage collection. While this won't change the memory used by your objects, it can have a big impact on performance with memory-intensive apps on machines with a lot of RAM.

Michael Borgwardt
+1  A: 

If you have millions of Integers and Floats etc. then see if your algorithms allow for representing the data in arrays of primitives. That means fewer references and lower CPU cost of each garbage collection.

David Plumpton
A: 
  1. Assign null value to all the variables which are no longer used. Thus make it available for Garbage collection.
  2. De-reference the collections once usage is over, otherwise GC won't sweep those.
pramodc84
I disagree with item 1. I would just let the gc do what it is suppose to do. There are only a few cases (arrays, collections) where this could be useful, not all variables. http://stackoverflow.com/questions/449409/does-assigning-objects-to-null-in-java-impact-garbage-collection
Boune