ieee-754

How to test if numeric conversion will change value?

I'm performing some data type conversions where I need to represent uint, long, ulong and decimal as IEEE 754 double floating point values. I want to be able to detect if the IEEE 754 data type cannot contain the value before I perform the conversion. A brute force solution would be to wrap a try-catch around a cast to double looking fo...

Double vs float on the iPhone

I have just heard that the iphone cannot do double natively thereby making them much slower that regular float. Is this true? Evidence? I am very interested in the issue because my program needs high precision calculations, and I will have to compromise on speed. ...

32-bit to 16-bit Floating Point Conversion

I need a cross-platform library/algorithm that will convert between 32-bit and 16-bit floating point numbers. I don't need to perform math with the 16-bit numbers; I just need to decrease the size of the 32-bit floats so they can be sent over the network. I am working in C++. I understand how much precision I would be losing, but that...

Find min/max of a float/double that has the same internal representation

Refreshing on floating points (also PDF), IEEE-754 and taking part in this discussion on floating point rounding when converting to strings, brought me to tinker: how can I get the maximum and minimum value for a given floating point number whose binary representations are equal. Disclaimer: for this discussion, I like to stick to 32 bi...

Can java floats be sorted by their byte representations?

I'm working in Hadoop, and I need to provide a comparator to sort objects as raw network order byte arrays. This is easy for me to do with integers -- I just compare each byte in order. I also need to do this for floats. I think, but I can't find a reference, that the IEEE 754 format for floats used by Java can be sorted by just comparin...

How to output IEEE-754 format integer as a float

I have a unsigned long integer value which represents a float using IEEE-754 format. What is the quickest way of printing it out as a float in C++? I know one way, but am wondering if there is a convenient utility in C++ that would be better. Example of the way that I know is: union { unsigned long ul; float f; } u; u.ul = 10...

Converting double to float without relying on the FPU rounding mode

Does anyone have handy the snippets of code to convert an IEEE 754 double to the immediately inferior (resp. superior) float, without changing or assuming anything about the FPU's current rounding mode? Note: this constraint probably implies not using the FPU at all. I expect the simplest way to do it in these conditions is to read the...

IEEE - 754 - find signbit, exponent, frac, normalized, etc..

I am taking in a 8 digit hexadecimal number as an IEEE 754 bit floating point number and i want to print information about that number( signbit, expbits, fractbits, normalized, denormalized, infinity, zero, NAN) floating point should be a single. I read up on bit shifting, and i think this is how i am suppose to do it?. however, i am no...

flush-to-zero behavior in floating-point arithmetic

While, as far as I remember, IEEE 754 says nothing about a flush-to-zero mode to handle denormalized numbers faster, some architectures offer this mode (e.g. http://docs.sun.com/source/806-3568/ncg_lib.html ). In the particular case of this technical documentation, standard handling of denormalized numbers is the default, and flush-to-z...

How do I read 64-bit IEEE Standard 754 double-precision numbers from binary data?

I have a stream of data which consists of 64-bit IEEE standard 754 floating point numbers. How would I read these as doubles in using C#? Is there a way to convert a long/ulong into a double? ...

Questions about two's complement and IEEE 754 representations

How would i go about finding the value of the two-byte two’s complement value 0xFF72 is"? Would i start by converting 0xFF72 to binary? reverse the bits. add 1 in binary notation. // lost here. write decimal. I just dont know.. Also, What about an 8 byte double that has the value: 0x7FF8000000000000. Its value as a floating point?...

Convert float to bigint (aka portable way to get binary exponent & mantissa)

In C++, I have a bigint class that can hold an integer of arbitrary size. I'd like to convert large float or double numbers to bigint. I have a working method, but it's a bit of a hack. I used IEEE 754 number specification to get the binary sign, mantissa and exponent of the input number. Here is the code (Sign is ignored here, that's...

What are the other NaN values?

The documentation for java.lang.Double.NaN says that it is A constant holding a Not-a-Number (NaN) value of type double. It is equivalent to the value returned by Double.longBitsToDouble(0x7ff8000000000000L). This seems to imply there are others. If so, how do I get hold of them, and can this be done portably? To be clear, I woul...

Is there an open-source c/c++ implementation of IEEE-754 operations?

Hi, I am looking for a reference implementation of IEEE-754 operations. Is there such a thing? ...

CLR JIT optimizations violates causality?

I was writing an instructive example for a colleague to show him why testing floats for equality is often a bad idea. The example I went with was adding .1 ten times, and comparing against 1.0 (the one I was shown in my introductory numerical class). I was surprised to find that the two results were equal (code + output). float @float =...

With IEEE-754, 0 < ABS(const) < 1, is (x / const) * const guaranteed to return distinct results for distinct values of X?

Assume I do this operation: (X / const) * const with double-precision arguments as defined by IEEE 754-2008, division first, then multiplication. const is in the range 0 < ABS(const) < 1. Assuming that the operation succeeds (no overflows occur), are distinct arguments of X to this operation guaranteed to return distinct results? I...

Do any real-world CPUs not use IEEE 754?

I'm optimizing a sorting function for a numerics/statistics library based on the assumption that, after filtering out any NaNs and doing a little bit twiddling, floats can be compared as 32-bit ints without changing the result and doubles can be compared as 64-bit ints. This seems to speed up sorting these arrays by somewhere on the ord...

Usefulness of signaling NaN?

I've recently read up quite a bit on IEEE 754 and the x87 architecture. I was thinking of using NaN as a "missing value" in some numeric calculation code I'm working on, and I was hoping that using signaling NaN would allow me to catch a floating point exception in the cases where I don't want to proceed with "missing values." Converse...

Will this C++ convert PDP-11 floats to IEEE?

I am maintaining a program that takes data from a PDP-11 (emulated!) program and puts it into a modern Windows-based system. We are having problems with some of the data values being reported as "1.#QNAN" and also "1.#QNB". The customer has recently revealed that 'bad' values in the PDP-11 program are represented by 2 16-bit words with a...

Exponent in IEEE 754

Why exponent in float is displaced by 127? Well, the real question is : What is the advantage of such notation in comparison to 2's complement notation? ...