views:

265

answers:

9

Hi!

I have encountered a problem i cannot find a solution. I am using a HashSet to store values. The values I store is of the custom type Cycles where i have overridden the HashCode and equals as following in order to make sure the slow performance is not cuased by the hascode or the equal methods Also i have set the initial capacity of the hashset to 10.000.000

@Override
public int hashCode() {
 final int prime = 31;
 int result = 1;
 result = prime * result + (int) (cycleId ^ (cycleId >>> 32));
 return result;
}

@Override
public boolean equals(Object obj) {
 if (this == obj)
 return true;
 if (obj == null)
 return false;
 if (getClass() != obj.getClass())
 return false;
 Cycle other = (Cycle) obj;
 if (cycleId != other.cycleId)
 return false;
 return true;
}

After the first 1.500.000 first values when i try to add a new value (with the add method of the HashSet class) the program is very slow. Eventually i am going to have java out of memory exception (Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space) before the stored values reach the 1.600.000

The IDE i use is Eclipse. So the next step was to increase the JVM heap size from the default value to 1 giga (using the commnads Xmx1000M and Xms1000M) Now the elipse starts with 10 times more memory available (i can see that in the bottom right where the total heap size memory and used memory is shown) but again i have the same "slow" performance and the same out of memory error IN THE SAME VALUES as before (after the 1.500.000 and before 1.600.000) which is very odd.

Does anyone has an idea what it might be the problem?

Thank you in advance

A: 

Maybe your computer doesn't have enough memory, hence it has to swap to disk.

BobTurbo
+2  A: 

A memory size available for the application you start from Eclipse should be configured from the Run menu. Try:

Run -> Run Configurations -> Arguments -> VM Arguments -> -Xmx1000M

The reason why your program is slow is Garbage Collector - it starts each time a memory is going to be out of the limit.

Vitalii Fedorenko
i have set the VM to 1G and it only uses less than 100M. the program behaves the same with a VM of 50 M or a VM of 1G. Strange enough i have an memory full error when the set has about 1500.000 and 1.600.000 elements indepedently of how much memroy i have set. The heap size memory in Eclipse is shown at the bottom right corner so i have double checked that the Xmx and Xms commands work correct..
Pitelk
The heap size shown at the bottom right corner is memory used by Eclipse rather than by your program. What heap size does jconsole show when you connect to your application?
Vitalii Fedorenko
Thank you Vitalii. I checked the jconsol and turned out that you are 100% right. The program uses all the memory while eclipse shows only 50MB out of the 1000MB. Also Andreas is right.. i have to declare explicitly to the lunch configuration that i want the program to use all the heap space independent of the fact i have started eclipse with 1000M heap. Thanks for your help
Pitelk
+1  A: 

If you want to increase the memory your program can use it won't help to increase Eclipse's heap size. You must put the parameter into the launch configuration's vm parameters of your program.

Andreas
Thank you Andreas! you are right about that.. i have to declare explicitly to the lunch configuration that i want the program to use all the heap space independent of the fact i have started eclipse with 1000M heap. Thanks for your help
Pitelk
+2  A: 

Have you tested your hashCode method implementation? it always returns 31, for any value of circleId. Not strange that your HashMap works slow, it has a linear performance.

Roman
Nop, you probably misread the implementation, it's fine.
Dimitris Andreou
@Dimitris Andreou: I read it correctly. It **always** returns 31 because `(X ^ (X >>> 32))` always returns `0` for ints.
Roman
It's obviously a long. (Note the explicit cast to int).
Dimitris Andreou
@Dimitris Andreou: if we suppose that cycleId is long (what is not the fact yet) then anyway I agreed with lasseespeholt's comment about hashCode implementation, it can contain `2 ^ 32` unique values which is enough to map 16 GB of integers on unique keys.
Roman
But if it's long, and my answer is unuseful as well as yours then I don't know how to help.
Roman
So your answer was addressing the astronomical improbability that someone would be shifting an int 32 positions, *and* then would type **four** parentheses to accommodate a cast that wasn't needed in the first place. Alright! (By the way, you meant 4GB integers).
Dimitris Andreou
Using all bits is also a recommendation of Effective Java, and also note that this was generated by eclipse. Perhaps you and lasseespeholt should be pointing this out to eclipse developers :)
Dimitris Andreou
@Dimitris: it's not related, but just curious: why 4GB? I calculated that 2 ^ 32 possible values = 2 ^ 32 mappings. Each int is 4b, so total weight is 2 ^ 32 * 4, what is 16 GB.
Roman
Oh, right, I thought you were counting the integers, not measuring their memory (but then, that what the 'B' of 'GB' means :)).
Dimitris Andreou
+1  A: 

JVM throws 'out of memory' NOT based on available memory. It is thrown when time being spent on the garbage collection is too much. check this. Exact implementation details vary based on JVM and the garbage collector implementation.

Increasing memory would not help in this case. You may have to choose another approach.

Gopi
Maybe you are right. But the thing is that i have the heap size free space and used space shown at the bottom right of eclipse so i can see when the garbage collector is running as when it does lot of memory resource is freed. Also the memory used when the error pops up is 150M out of the 1000M .As far as i know the garbage collector doesnt run until more memory is needed where here this is not the case. Thanks for your answer
Pitelk
turns out to be that eclise virtual memory is not the same of the virtual memory my program uses.. I have to set eclipse's virtual memory to 1G but then i have to do the same in the run configuration. Also the displayed heap size refers to eclipse's heap size used and not my applications. So in the aftermath.. my program uses indeed all the memory.
Pitelk
+3  A: 

Not enough heap memory (increase it via -Xmx, e.g. -Xmx512m). When free memory goes very low, then much, much time is spent by the garbage collector which furiously scans the heap for unreachable objects.

Your hashCode() is fine, extra points for using all bits of the cycleId long.

Edit. Now I saw you did increase the memory, and didn't help. First of all, are you sure you did manage to increase the memory? You could check this by jconsole, connect to your app and see its heap size.

For an alternative explanation to be verified, is there any particular pattern in your cycleId that could make this hashCode() implementation bad? Like, its 32 high order bits are mostly similar to the 32 low order bits. (Yeah, right).

But no. Even if that would be the case, you would be seeing a gradual degradation of performance, not a sharp drop at a specific point (and you do get a OutOfMemoryError and frenzy gc operation). So my best guess is still a memory issue. You either didn't increase the heap size as you thought, or there is some other code grabbing memory at some point. (You could use a tool like VisualVM to profile this, and get a heap dump upon OOME, and see what objects it contains).

Edit2 I made bold the correct part of the above.

Dimitris Andreou
well.. the hachcode method is automatically generated by eclipse ...so extra points to eclipse :)As far as the Xmx i have inreased it but strangly i have the same problem. The odd thing is that even i have increased the heap memory the programs starts to get slow in the same amount of data as before....
Pitelk
See my updated answer, we were typing at the same time.
Dimitris Andreou
(Also see the rest of the answers that suggest you didn't increase the heap of your app, but of eclipse itself.)
Dimitris Andreou
Why extra points to eclipse for generating the hashCode() method when it could have used Long.hashCode() ?
lasseespeholt
Well the problem was 2 sided.. First i increased (as i should have) the eclipses heap size. Then i had to go to the run configuration and do the same for my program. Second, the heap size usage shown at the bottom right is the heap size the Eclipse uses and not my applications! Now i have done all the corrections necessary and indeed i can see that my application is consuming all the memory (2GB of heap). Which makes me suicidal because now i have to use another approach. How can i store this information to disk instead of memory but be able to access specific info in constant time? Thanks!
Pitelk
@lasseespeholt, two points. Long.hashCode() is exactly as the one eclipse generated. Also, to use Long.hashCode(), you have to instantiate a Long object, which is not needed if you already have the long primitive.
Dimitris Andreou
@Pitelk, glad you resolved your (first) problem (not an eclipse user here). The second question is definitely non-trivial, you should make another post for that (that is, if you can't find an easy way to reduce the memory footprint/cut some low hanging fruit or what not).
Dimitris Andreou
@Pitelk, you could also use a database, for a trivial solution. Of course random accesses on disk would be *much, much* slower (not even the cache would offer any benefit with these).
Dimitris Andreou
A: 

How are you initializing your HashSet? You need to be aware of its growth pattern. With every add operation, it checks whether it is getting close to capacity. If it reaches a certain point (determined by its 'load factor'), it performs a resizing operation which can be expensive. From the JavaDoc (of HashMap - the collection that backs HashSet):

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

akf
the initial capacity is set to 10.000.000 and the programs starts to be slow at 1.500.000 so this propably is not the case. Thanks for your anwser
Pitelk
+5  A: 

You don't want to increase the JVM heap for Eclipse, you want to set it for your program.

Go to Run > Run Configurations (or Debug Configurations) and set the VM Options there.

Devon_C_Miller
A: 

I don't know how effective your hashcode() implementation is.May be this could help you.

Effective Java hashcode

Emil