questions about floating-point | ansaurus

floating-point

How are floating point literals in C interpreted?

In a C program, when you write a floating point literal like 3.14159 is there standard interpretation or is it compiler or architecture dependent? Java is exceedingly clear about how floating point strings are interpreted, but when I read K&R or other C documentation the issue seems swept under the rug. ...

Good way to hash a float vector?

I am well aware of all the problems involved in comparing floats. This is exactly the reason for this question. I'm looking to create a fast hash table for values that are 3D vectors (3 floats - x,y,z). It can be assumed that the length of the vector is always 1.0 (sqrt(x*x+y*y+z*z) is 1.0) Essentially this means that I'm looking for ...

How do I suppress scientific notation in Python?

Here's my code: x = 1.0 y = 100000.0 print x/y My quotient displays as 1.00000e-05 Is there any way to suppress scientific notation and make it display as 0.00001? Thanks in advance. This feels somewhat ridiculous to ask but I haven't figured out a way to do it yet. I'm going to use the result as a string. ...

How do I do floating point rounding with a bias (always round up or down)?

I want to round floats with a bias, either always down or always up. There is a specific point in the code where I need this, the rest of the program should round to the nearest value as usual. For example, I want to round to the nearest multiple of 1/10. The closest floating point number to 7/10 is approximately 0.69999998807, but th...

Does Java's floating point implementation still have the problems identified by Kahan?

I've read a few papers from Kahan tonight, and his famous rant against Java. Before I dive into the JVM spec, did anything change since the initial rant on this front? For example: setting rounding mode accessing the flags getting more precision for free ... ? Thanks, Nico. ...

Why 1.0f + 0.0000000171785715f returns 1f ?

After one hour of trying to find a bug in my code I've finally found the reason. I was trying to add a very small float to 1f, but nothing was happening. While trying to figure out why I found that adding that small float to 0f worked perfectly. Why is this happening? Does this have to do with 'orders of magnitude'? Is there any workar...

what languages expose IEEE 754 traps to the developer ?

I'd like to play with those traps for educational purpose. A common problem with the default behavior in numerical calculus is that we "miss" the Nan (or +-inf) that appeared in a wrong operation. Default behavior is propagation through the computation, but some operation (like comparisons) break the chain and loose the Nan, and the res...

floating-point-exceptions

How to nicely format floating types to String?

An 64-bit double can represent integer +/- 253 exactly Given this fact I choose to use a double type as a single type for all my types, since my largest integer is unsigned 32-bit. But now I have to print these pseudo integers, but the problem is they are also mixed in with actual doubles. So how do I print these doubles nicely in Jav...

Floating point accuracy in F# (and .NET)

In "F# for Scientists" Jon Harrop says: Roughly speaking, values of type int approximate real numbers between min-int and max-int with a constant absolute error of +- 1/2 whereas values of the type float have an approximately-constant relative error that is a tiny fraction of a percent. Now, what does it mean? Int ...

Strange results with floating-point comparison

I have this simple test: double h; ... // code that assigns h its initial value, used below ... if ((h>0) && (h<1)){ //branch 1 -some computations } else{ //branch 2- no computations } I listed my values as I got some really strange results and for example if: h=1 then the first branch is reached and I do not understand why since if...

How to handle multiplication of numbers close to 1

I have a bunch of floating point numbers (Java doubles), most of which are very close to 1, and I need to multiply them together as part of a larger calculation. I need to do this a lot. The problem is that while Java doubles have no problem with a number like: 0.0000000000000000000000000000000001 (1.0E-34) they can't represent some...

Who does non-decimal bignums with floating radix point?

Nice as the Tcl libraries math::bignum and math::bigfloat are, the middle ground between the two needs to be addressed. Namely, bignums which are in different radices and have a radix point. At present math::bignum only handles integers (afaict) and math::bigfloat won't let you specify different radices to math::bigfloat::fromstr (ditt...

Why the result is different for this problem?

I came across this following arithmetic problem. But the result is different from normal maths operation, Why is it so? double d1 = 1.000001; double d2 = 0.000001; Console.WriteLine((d1-d2)==1.0); ...

floating-accuracy

Matlab Fraction to Floating Point

After using the 'solve' function on an equation with one variable, it seems like Matlab doesn't like using floating point. So, my answer is ans = -2515439103678008769411809280/29019457930552314063110978530889-1/232155663444418512504887828247112*13479465975722384794797850090594238631144539220477565900842902305^(1/2) and I'm not sure wh...

Why is printf showing -1.#IND for FPTAN results?

I am working on a program which produces assembler code from expressions. One of the functions required is tan(x) which currently works using the following sequence of code (the addresses are filled in at run time): fld [0x00C01288]; fld st(0); fsin; fld st(1); fcos; fdivp; fst [0x0030FA8C]; However, I would like to use the FPTAN opco...

double.Epsilon vs. std::numeric_limits<double>::min()

Why double.Epsilon != std::numeric_limits<double>::min()? On my PC: double.Epsilon == 4.9406564584124654E-324 and is defined in .NET std::numeric_limits<double>::min() == 2.2250738585072014e-308 Is there a way to get 2.2250738585072014e-308 from .NET? ...

Why is floating point arithmetic in C# imprecise?

Why does the following program print what it prints? class Program { static void Main(string[] args) { float f1 = 0.09f*100f; float f2 = 0.09f*99.999999f; Console.WriteLine(f1 > f2); } } ...

Parse four bytes to floating-point in C

How do I take four received data bytes and assemble them into a floating-point number? Right now I have the bytes stored in an array, which would be, received_data[1] ... received_data[4]. I would like to store these four bytes as a single 32-bit single precision float. -Thanks I'm actually receiving a packet with 19 bytes in it and...

Comparing a float to an integer in C

Can I compare a floating-point number to an integer? Will the float compare to integers in code? float f; // f has a saved predetermined floating-point value to it if (f >=100){__asm__reset...etc} Also, could I... float f; int x = 100; x+=f; Sorry, I don't have a lot of experience using floating point. But ultimately I ha...

Representing integers in doubles

Can a double (of a given number of bytes, with a reasonable mantissa/exponent balance) always fully precisely hold the range of an unsigned integer of half that number of bytes? E.g. can an eight byte double fully precisely hold the range of numbers of a four byte unsigned int? What this will boil down to is if a two byte float can hol...

1
...
4
5
6
7
8
...
33