questions about ieee-754 | ansaurus

ieee-754

What claims, if any, can be made about the accuracy/precision of floating-point calculations?

I'm working on an application that does a lot of floating-point calculations. We use VC++ on Intel x86 with double precision floating-point values. We make claims that our calculations are accurate to n decimal digits (right now 7, but trying to claim 15). We go to a lot of effort of validating our results against other sources when o...

Hex Representation of Floats in Haskell

I want to convert a Haskell Float to a String that contains the 32-bit hexadecimal representation of the float in standard IEEE format. I can't seem to find a package that will do this for me. Does anybody know of one? I've noticed that GHC.Float offers a function to decompose a Float into its signed base and exponent (decodeFloat), but...

What is the status of tcl_precision?

I don't use Tcl in my daily work. However, I have a colleague who occasionally interacts with a customer who wishes our tool's extension language worked more like Tcl (!). One topic he brought up was how Tcl let him set how much precision was stored in a double, via a global variable, tcl_precision. I did some web searches, and the docu...

floating-accuracy

How do I convert from a decimal number to IEEE 754 single-precision floating-point format?

How would I go about manually changing a decimal (base 10) number into IEEE 754 single-precision floating-point format? I understand that there is three parts to it, a sign, a exponential, and a mantissa. I just don't completely understand what the last two parts actually represent. Thanks, Rob ...

How do you convert from IEEE 754 single-precision floating-point format to decimal?

I understand that the first bit is the sign and that the next 8 bits is the exponent. So in this example you would have 1.1001*2^-4 ? How do I then interpret this in decimal? 0 01111011 10010000000000000000000 ...

Exact textual representation of an IEEE "double"

I need to represent an IEEE 754-1985 double (64-bit) floating point number in a human-readable textual form, with the condition that the textual form can be parsed back into exactly the same (bit-wise) number. Is this possible/practical to do without just printing the raw bytes? If yes, code to do this would be much appreciated. ...

floating-accuracy

Fast sign in C++ float...are there any platform dependencies in this code?

Searching online, I have found the following routine for calculating the sign of a float in IEEE format. This could easily be extended to a double, too. // returns 1.0f for positive floats, -1.0f for negative floats, 0.0f for zero inline float fast_sign(float f) { if (((int&)f & 0x7FFFFFFF)==0) return 0.f; // test exponent & mantis...

Fixed Point to Floating Point and Backwards

Is converting Fixed Pt. (fixed n bit for fraction) to IEEE double safe ? ie: does IEEE double format can represent all numbers a fixed point can represent ? The test: a number goes to floating pt format then back to it's original fixed pt format. ...

floating-accuracy

Compilation platform taking FPU rounding mode into account in printing, conversions

EDIT: I had made a mistake during the debugging session that lead me to ask this question. The differences I was seeing were in fact in printing a double and in parsing a double (strtod). Stephen's answer still covers my question very well even after this rectification, so I think I will leave the question alone in case it is useful to s...

Convert pre-IEEE-754 C++ floating-point numbers to/from C#

Before .Net, before math coprocessors, before IEEE-574, Microsoft defined a bit pattern for floating-point numbers. Old versions of the C++ compiler happily used that definition. I am writing a C# app that needs to read/write such floating-point numbers in a file. How can I do the conversions between the 2 bit formats? I need conversion...

Is there a floating point value of x, for which x-x == 0 is false?

In most cases, I understand that a floating point comparison test should be implemented using over a range of values (abs(x-y) < epsilon), but does self subtraction imply that the result will be zero? // can the assertion be triggered? float x = //?; assert( x-x == 0 ) My guess is that nan/inf might be special cases, but I'm more int...

floating-accuracy

Recording/Reading C doubles in the IEEE 754 interchange format

So I'm serializing a C data structure for cross-platform use, and I want to make sure I'm recording my floating point numbers in a cross-platform manner. I had been planning on just doing char * pos; /*...*/ *((double*) pos) = dataStructureInstance->fieldWithOfTypeDouble; pos += sizeof(double); But I wasn't sure that the bytes wo...

how IEEE-754 floating point numbers work

Let's say I have this: float i = 1.5 in binary, this float is represented as: 0 01111111 10000000000000000000000 I broke up the binary to represent the 'signed', 'exponent' and 'fraction' chunks. What I don't understand is how this represents 1.5. The exponent is 0 once you subtract the bias (127 - 127), and the fraction part with...

Are there any modern platforms with non-IEEE C/C++ float formats?

Hi all, I am writing a video game, Humm and Strumm, which requires a network component in its game engine. I can deal with differences in endianness easily, but I have hit a wall in attempting to deal with possible float memory formats. I know that modern computers have all a standard integer format, but I have heard that they may not...

network-programming

How to turn Bytes into number (IEEE754 to number) Actionscript

How to write such C# code in Actionscript? Console.WriteLine(BitConverter.ToDouble(new byte[8] { 0x77, 0xBE, 0x9F, 0x1A, 0x2F, 0x0D, 0x4F, 0x40 }, 0)); ...

Convert ieee 754 float to hex with c - printf

Ideally the following code would take a float in IEEE 754 representation and convert it into hexadecimal void convert() //gets the float input from user and turns it into hexadecimal { float f; printf("Enter float: "); scanf("%f", &f); printf("hex is %x", f); } I'm not too sure what's going wrong. It's converting the...

is memset(ary,0,length) a portable way of inputting zero in double array

Possible Duplicate: What is faster/prefered memset or for loop to zero out an array of doubles The following code uses memset to set all the bits to zero int length = 5; double *array = (double *) malloc(sizeof(double)*length); memset(array,0,sizeof(double)*length); for(int i=0;i<length;i++) if(array[i]!=0.0) fprintf(st...

Reading a Windows 'binary' float into a ASP jscript variable

I need to read files produced by a legacy Windows app that stores real numbers (the 8-byte "double" type) in binary - i.e. as a packed array of 8 bytes. I can read the 8 byte group OK but how can I present it to my ASP JScript code such I can get the real number back again. Or to put it another way: Say a file was produced by a Window...

in k_rem_pio2 file function is there any number which not hold carry!=0 for ih==2 case?

I'm working on application using the math computation. For my code I need to know how to cover all paths of code of libraries I'm using. The one of not covered yet is in k_rem_pio2 file. The kernel function have branch for ih==2 numbers and then ther is an if(carry!=0) -z= scalbn(one,q0); my question is for what number the i...

Convert NSData to primitive variable with ieee-754 or twos-complement ?

Hi every one. I am new programmer in Obj-C and cocoa. Im a trying to write a framework which will be used to read a binary files (Flexible Image Transport System or FITS binary files, usually used by astronomers). The binary data, that I am interested to extract, can have various formats and I get its properties by reading the header of...

twos-complement

1
2
3
4