views:

128

answers:

6

Hi all.

I'm writing an application that uses Dijkstra algorithm to find minimal paths in the graph. The weights of the nodes and edges in the graph are float numbers, so the algorithm doing many arithmetics on float numbers. Could I gain a running time improve if I convert all weight to ints? Is int arithmetic operations are faster in Java then float ones?

I tried to write a simple benchmark to check that out, but I'm not satisfied with the results I got. Possibly the compiler has optimized some parts of the program so the results doesn't looks good for me.


EDIT:

The problem I'm trying to solve is in the Information Retrieval field. The application should show answers to a query posed as a set of keywords.

My data structure is a weighted directed graph. Given a set of leaf nodes I have to find a smallest tree that connects these nodes and show the answer to the user. The weights are assigned by a weighting function based partially on the tf/idf technique. The user don't know what weights I assign to the nodes and edges he just wants to see answers relevant to the query he posed. So exact results are not required, just a possibility to enumerate answers according to theirs weights. Just the native use of weighting function (as I mentioned it is based on tf/idf) gives float weights so I used floats so far.

I hope this adds some background to the question.

+1  A: 

for simple operations int is faster, however with int you may have to do more work to get the same result. e.g.

as float

float f = 15 * 0.987;

as int

int i = 15 * 987 / 1000;

The extra division means the int operation can take longer.

Peter Lawrey
in Dijkstra algorithm I just summarize and compare the weight of the paths, so the devision operation is pretty rear for me. How do you know that for simple operations ints are faster, is that a common scene or you can point me to some literature about the topic.
jutky
you need to look atthe native code generated by the JVM and compare clock cycles. However, both operations are fairly fast compared with the cost of cache misses and system calls. It is highly likely the choice of datatype won't make much difference.
Peter Lawrey
+2  A: 

As ever with this sort of thing you should set yourself some performance goals, and then profile the app to see if it meets them.

Often times you may find surprising results; that the time taken is hardly affected by base numerical type at all, or that your algorithm is suboptimal.

And regarding compiler optimisations - they're a real, and valid part of performance optimisation.

If using type A is theoretically faster than using type B, but your compiler can optimise type B to be quicker in a real scenario then thats a valuable piece of evidence, not source for dissapointment.

Visage
I just wanted to hear if the performance improve I can gain worth the time it would take me to change pretty good part of the application. But it seems I can't know this for sure in advance, and the best way to check this out is to implement two version of the algorithm and to measure the running times.
jutky
A: 

I don't think so.

Float is 4 byte. And the Int in java is also 4 byte.

Why not use the Date(java.util.Date) to get the running time?

You can define a graph that it's own 100000 nodes. Then calculate it.

You might want to use the word “byte” instead of “bit”. A 4 *bit* integer can only hold sixteen distinct values…
Donal Fellows
I'm sorry...My english is poor.So I used the wrong word.(My mother tongue is not english)Actually, I think if the speed is faster than float, It's because the hardware. In the physics, The int maybe easy to achieve than float.
A: 

If you just want to compare weights, you should prefer int to float.

Truong Ha
The same comment as for another answer: is that a common scene or you can point me to some literature about the topic.
jutky
A: 

Generally you should not worry about a choice between int and float for performance reasons.

Here's an excerpt from the Appendix of Java Puzzlers:

Floating-point arithmetic is inexact. Don't use floating-point where exact results are required; instead, use an integral type or BigDecimal. Prefer double to float.

Unless you have a really good reason, you should generally prefer double to float if you must use floating point operation. If exact result is desired, then go ahead and use BigDecimal; it'll be slower since it's not a primitive, but unless profiling shows that it's not acceptable, this is often the best option.

If you must use floating point operation, then trying to optimize this by using int is ill-advised. This is likely to be a premature optimization and will only complicate the code unnecessary. Write it in the most natural, most readable way. Do not complicate your code unnecessarily for the sake of slight performance gains.

If you don't actually need floating point operation, then by all means use int or long instead.

polygenelubricants
See also http://stackoverflow.com/questions/2550281/floating-point-vs-integer-calculations-on-modern-hardware and http://stackoverflow.com/questions/2010252/float-versus-integer-arithmetic-performance-on-modern-chips
polygenelubricants
I added some background to the question. hope that clarifies some things. Thanks for the links.
jutky
A: 

I think the performance is very much dependent on the algorithm and the platform the software is running on.

If you're doing matrix / array calculations on an X86 platform the runtime might optimize it to use SSE, which is a float/double only extended instruction set.

On other platforms the runtime might optimize to OpenCL (I don't believe anyone does that right now, but it might happen:). I have no clue what runs fastest on such a platform, and under what conditions. It may just be that OpenCL is optimized for an integer workload.

Under these circumstances I would conclude that it is not useful to optimize the data type (float or int) at this point, and just optimize the readability of the code.

If your code is highly performance critical, and you know exactly on which hardware the system will be running now and in the future, you could test typical workloads with various algorithms and select the one which best meets your needs.

But in general, just use an algorithm you can understand, keep the code readable, and thereby the bug count low. Fast code isn't worth that much if the results are not correct :)

extraneon