questions about floating-point | ansaurus

floating-point

Bitwise operation on floating point numbers (for graphics)?

Possible Duplicate: how to perform bitwise operation on floating point numbers Hello, everyone! Background: I know that it is possible to apply bitwise operation on graphics (for example XOR). I also know, that in graphic programs, graphic data is often stored in floating point data types (to be able for example to "multiply"...

Dealing with Floating Point exceptions

Hi, I am not sure how to deal with floating point exceptions in either C or C++. From wiki, there are following types of floating point exceptions: IEEE 754 specifies five arithmetic errors that are to be recorded in "sticky bits" (by default; note that trapping and other alternatives are optional and, if provided, non-default). * ...

Controlling FPU behavior in an OpenMP program?

I have a large C++ program that modifies the FPU control word (using _controlfp()). It unmasks some FPU exceptions and installs a SEHTranslator to produce typed C++ exceptions. I am using VC++ 9.0. I would like to use OpenMP (v.2.0) to parallelize some of our computational loops. I've already successfully applied it to one, but the n...

With IEEE-754, 0 < ABS(const) < 1, is (x / const) * const guaranteed to return distinct results for distinct values of X?

Assume I do this operation: (X / const) * const with double-precision arguments as defined by IEEE 754-2008, division first, then multiplication. const is in the range 0 < ABS(const) < 1. Assuming that the operation succeeds (no overflows occur), are distinct arguments of X to this operation guaranteed to return distinct results? I...

language-agnostic

Why and when should one call _fpreset( )?

The only documentation I can find (on MSDN or otherwise) is that a call to _fpreset() "resets the floating-point package." What is the "floating point package?" Does this also clear the FPU status word? I see documentation that says to call _fpreset() when recovering from a SIGFPE, but doesn't _clearfp() do this as well? Do I need to...

Correct way to emulate single precision floating point in python?

What's the best way to emulate single-precision floating point in python? (Or other floating point formats for that matter?) Just use ctypes? ...

Fastest way to convert 16.16 fixed point to 32 bit float in c/c++ on x86?

Most people seem to want to go the other way. I'm wondering if there is a fast way to convert fixed point to floating point, ideally using SSE2. Either straight C or C++ or even asm would be fine. ...

Emulate floating point string conversion behaviour of Linux on Windows

I've encountered an annoying problem in outputting a floating point number. When I format 11.545 with a precision of 2 decimal points on Windows it outputs "11.55", as I would expect. However, when I do the same on Linux the output is "11.54"! I originally encountered the problem in Python, but further investigation showed that the diff...

Do any real-world CPUs not use IEEE 754?

I'm optimizing a sorting function for a numerics/statistics library based on the assumption that, after filtering out any NaNs and doing a little bit twiddling, floats can be compared as 32-bit ints without changing the result and doubles can be compared as 64-bit ints. This seems to speed up sorting these arrays by somewhere on the ord...

Compression algorithm for IEEE-754 data

Anyone have a recommendation on a good compression algorithm that works well with double precision floating point values? We have found that the binary representation of floating point values results in very poor compression rates with common compression programs (e.g. Zip, RAR, 7-Zip etc). The data we need to compress is a one dimensio...

How to assign a number to a floating point register using sparc assembly?

For example, I want to assign 0x5 to %f1. How to achieve this? ...

Usefulness of signaling NaN?

I've recently read up quite a bit on IEEE 754 and the x87 architecture. I was thinking of using NaN as a "missing value" in some numeric calculation code I'm working on, and I was hoping that using signaling NaN would allow me to catch a floating point exception in the cases where I don't want to proceed with "missing values." Converse...

Dealing with floating point errors in .NET

I'm working on a scientific computation & visualization project in C#/.NET, and we use doubles to represent all the physical quantities. Since floating-point numbers always include a bit of rounding, we have simple methods to do equality comparisons, such as: static double EPSILON = 1e-6; bool ApproxEquals(double d1, double d2) { ...

How to deal with strange rounding of floats in PHP

As we all know, floating point arithmetic is not always completely accurate, but how do you deal with its inconsistencies? As an example, in PHP 5.2.9: (this doesn't happen in 5.3) echo round(14.99225, 4); // 14.9923 echo round(15.99225, 4); // 15.9923 echo round(16.99225, 4); // 16.9922 ?? echo round(17.99225, 4); // 17.9922 ?? ...

Will this C++ convert PDP-11 floats to IEEE?

I am maintaining a program that takes data from a PDP-11 (emulated!) program and puts it into a modern Windows-based system. We are having problems with some of the data values being reported as "1.#QNAN" and also "1.#QNB". The customer has recently revealed that 'bad' values in the PDP-11 program are represented by 2 16-bit words with a...

Round with floor problem in Objective-C

I am calculating g with e and s, which are all doubles. After that I want to cut off all digits after the second and save the result in x, for example: g = 2.123 => x = 2.12 g = 5.34995 => x = 5.34 and so on. I Use... g = 0.5*e + 0.5*s; x = floor(g*100)/100; ...and it works fine most of the time. But sometimes I get strange results...

What claims, if any, can be made about the accuracy/precision of floating-point calculations?

I'm working on an application that does a lot of floating-point calculations. We use VC++ on Intel x86 with double precision floating-point values. We make claims that our calculations are accurate to n decimal digits (right now 7, but trying to claim 15). We go to a lot of effort of validating our results against other sources when o...

Cast the return value of a function that returns a floating point type

Hi, I have difficulty understanding the article Cast the return value of a function that returns a floating point type (1) In Conversion as if by assignment to the type of the function is required if the return expression has a different type than the function, but not if the return expression has a wider value only because of wid...

Floats, doubles and half floats

I was wondering about how bits are organized on floats (4 bytes), double (8 bytes) and half floats (2 bytes, used on OpenGL implementation). Further, how I could convert from one to another? ...

C printf using %d and %f

I was working on this program and I noticed that using %f for a double and %d for a float gives me something completely different. Anybody knows why this happens? int main () { float a = 1F; double b = 1; printf("float =%d\ndouble= %f", a, b); } This is the output float = -1610612736 double = 1903598371927661359216126713647498937...

1
...
16
17
18
19
20
...
33