Possible Duplicate:
how to perform bitwise operation on floating point numbers
Hello, everyone!
Background:
I know that it is possible to apply bitwise operation on graphics (for example XOR). I also know, that in graphic programs, graphic data is often stored in floating point data types (to be able for example to "multiply"...
Hi,
I am not sure how to deal with floating point exceptions in either C or C++. From wiki, there are following types of floating point exceptions:
IEEE 754 specifies five arithmetic errors that are to be recorded in "sticky bits" (by default; note that trapping and other alternatives are optional and, if provided, non-default).
* ...
I have a large C++ program that modifies the FPU control word (using _controlfp()). It unmasks some FPU exceptions and installs a SEHTranslator to produce typed C++ exceptions. I am using VC++ 9.0.
I would like to use OpenMP (v.2.0) to parallelize some of our computational loops. I've already successfully applied it to one, but the n...
Assume I do this operation:
(X / const) * const
with double-precision arguments as defined by IEEE 754-2008, division first, then multiplication.
const is in the range 0 < ABS(const) < 1.
Assuming that the operation succeeds (no overflows occur), are distinct arguments of X to this operation guaranteed to return distinct results?
I...
The only documentation I can find (on MSDN or otherwise) is that a call to _fpreset() "resets the floating-point package." What is the "floating point package?" Does this also clear the FPU status word? I see documentation that says to call _fpreset() when recovering from a SIGFPE, but doesn't _clearfp() do this as well? Do I need to...
What's the best way to emulate single-precision floating point in python? (Or other floating point formats for that matter?) Just use ctypes?
...
Most people seem to want to go the other way. I'm wondering if there is a fast way to convert fixed point to floating point, ideally using SSE2. Either straight C or C++ or even asm would be fine.
...
I've encountered an annoying problem in outputting a floating point number. When I format 11.545 with a precision of 2 decimal points on Windows it outputs "11.55", as I would expect. However, when I do the same on Linux the output is "11.54"!
I originally encountered the problem in Python, but further investigation showed that the diff...
I'm optimizing a sorting function for a numerics/statistics library based on the assumption that, after filtering out any NaNs and doing a little bit twiddling, floats can be compared as 32-bit ints without changing the result and doubles can be compared as 64-bit ints. This seems to speed up sorting these arrays by somewhere on the ord...
Anyone have a recommendation on a good compression algorithm that works well with double precision floating point values? We have found that the binary representation of floating point values results in very poor compression rates with common compression programs (e.g. Zip, RAR, 7-Zip etc).
The data we need to compress is a one dimensio...
For example, I want to assign 0x5 to %f1. How to achieve this?
...
I've recently read up quite a bit on IEEE 754 and the x87 architecture. I was thinking of using NaN as a "missing value" in some numeric calculation code I'm working on, and I was hoping that using signaling NaN would allow me to catch a floating point exception in the cases where I don't want to proceed with "missing values." Converse...
I'm working on a scientific computation & visualization project in C#/.NET, and we use doubles to represent all the physical quantities. Since floating-point numbers always include a bit of rounding, we have simple methods to do equality comparisons, such as:
static double EPSILON = 1e-6;
bool ApproxEquals(double d1, double d2) {
...
As we all know, floating point arithmetic is not always completely accurate, but how do you deal with its inconsistencies?
As an example, in PHP 5.2.9: (this doesn't happen in 5.3)
echo round(14.99225, 4); // 14.9923
echo round(15.99225, 4); // 15.9923
echo round(16.99225, 4); // 16.9922 ??
echo round(17.99225, 4); // 17.9922 ??
...
I am maintaining a program that takes data from a PDP-11 (emulated!) program and puts it into a modern Windows-based system. We are having problems with some of the data values being reported as "1.#QNAN" and also "1.#QNB". The customer has recently revealed that 'bad' values in the PDP-11 program are represented by 2 16-bit words with a...
I am calculating g with e and s, which are all doubles. After that I want to cut off all digits after the second and save the result in x, for example:
g = 2.123 => x = 2.12
g = 5.34995 => x = 5.34
and so on. I Use...
g = 0.5*e + 0.5*s;
x = floor(g*100)/100;
...and it works fine most of the time. But sometimes I get strange results...
I'm working on an application that does a lot of floating-point calculations. We use VC++ on Intel x86 with double precision floating-point values. We make claims that our calculations are accurate to n decimal digits (right now 7, but trying to claim 15).
We go to a lot of effort of validating our results against other sources when o...
Hi,
I have difficulty understanding the article Cast the return value of a function that returns a floating point type
(1) In
Conversion as if by assignment to the type of the function is required if the return expression has a different type than the function, but not if the return expression has a wider value only because of wid...
I was wondering about how bits are organized on floats (4 bytes), double (8 bytes) and half floats (2 bytes, used on OpenGL implementation).
Further, how I could convert from one to another?
...
I was working on this program and I noticed that using %f for a double and %d for a float gives me something completely different. Anybody knows why this happens?
int main ()
{
float a = 1F;
double b = 1;
printf("float =%d\ndouble= %f", a, b);
}
This is the output
float = -1610612736
double = 1903598371927661359216126713647498937...