views:

187

answers:

4

Dear StackOverflow,

I'm new to java world from C++ background. I'd like to port some C++ code to Java. The code uses Sparse vectors:

struct Feature{
int index;
double value;
};

typedef std::vector<Feature> featvec_t;

As I understood, if one makes an object, there will be some overhead on memory usage. So naive implementation of Feature will overhead signifiantly when there will be 10-100 millions of Features in a set of featvec_t.

How to represent this structure memory efficiently in Java?

A: 

An object in Java (I guess) has:

  • sizeof(index)
  • sizeof(value)
  • sizeof(Class*) <-- pointer to concrete class

So the diference is four bytes from the pointer. If your struct is 4+8=12 bytes it's a 33% overhead... but I can't think other (better) way to do it.

helios
you could avoid creating diferent objects for each point by creating two vectors: one for integer indexes and one for double values, but if it's a List<Integer> it will autobox int values into Integer which will cost time and space. So... no good solution.
helios
The size of a pointer can vary with the JVM. It's not always 4 bytes.
DR
Oh, please revise how is your vector growing... if you know the size before hand you could create an int[] and a double[] for the points, and a containing class having logic for accesing the items. If you don't know the size but you DO initialize it the first time you could create two LinkedList's (or whatever is practical for adding a lot of items) and convert then to a more static form.
helios
@DR: yep. I'm assuming a 32bit JVM, I know. And by the way, I'm outlining the size occupied in the heap by the Feature object, not the size of the reference you have to have somewhere to hold it.
helios
+5  A: 

To avoid memory overhead for each entry, you could write a java.util.List<Feature> implementation that wraps arrays of int and double, and builds Feature objects on demand.

To have it resize automatically, you could use TIntArrayList and TDoubleArrayList from GNU trove.

Michael Borgwardt
The array is still not sparse; In order to get that; I think that the simplest solution is to store the values in a TreeMap.
KarlP
Yes, the array is sparse; if you look at the example code, the index is stored explicitly. Access would be via binary search, I suppose. A TreeMap would add even more overhead.
Michael Borgwardt
Wouldn't you want a HashMap for this, instead of a TreeMap?
MSalters
If memory overhead is a big concern and your algorithm does not require random access to specific elements (which the C++ code does not allow either) then you do not want a map.
Michael Borgwardt
+1  A: 

Is the question about space for the struct itself or the sparse vector? Since others have answered the former, I'll shoot for the latter...

There aren't any sparse lists/matrices in the standard Java collections to my knowledge.

You could build an equivalent using a TreeMap keyed on the index.

Paolo
+4  A: 

If memory is really your bottleneck, try storing your data in two separate arrays: int[] index and double[] value.

But in most cases with such big structures performance (time) will be the main issue. Depending on operations mostly performed on your data (insert, delete, get, etc.) you need to choose appropriate data structure to store objects of class Feature. Start your explorations with java.util.Collection interface, its subinterfaces (List, Set, etc) and their implementations provided in java.util package.

witzar
+1 - don't try to solve a memory usage problem unless you are sure it is a real problem.
Stephen C