views:

117

answers:

2

I am writing a program that creates ICC color formats. These formats specify a data type called s15Fixed16Number which has a sign bit, 15 integer bits and 16 fractional bits. IEEE 754 32-bit floats have a sign bit, 8 exponent bits and 23 fractional bits.

I need to get input from a text box, and convert them into a s15Fixed16Number. Some searching turned up this on Google books, but that is talking about converting a decimal number to a s15Fixed16Number. I suppose I could just use the method explained in the link, but I haven't done any testing yet to determine how accurate that would be. I guess I could also try to convert the character input from the text box, but I haven't thought about that much yet.

I'm using Cocoa but I don't think that matters; any C function should work. Here are some example values in s15Fixed16Number format:

              -32768.0 = 0x80000000
                     0 = 0x00000000
                   1.0 = 0x00010000
 32767 + (65535/65536) = 0x7FFFFFFF

I guess it's been awhile since that numerical computation class!

+2  A: 

Don't get carried away about the internal representation of the float. Fixed-point values are just integers, with a constant scale factor. Just remember that you have more limited precision in floats than in your target format, so expected values may be off in the lower 9 bits for large values.

//s15Fixed16Number is presumably typedef'ed to unsigned int
float foo = 1.0f;
int fooFixedSigned = (int)(foo * 65536);
s15Fixed16Number fooFixed = (s15Fixed16Number)(abs(fooFixedSigned));
if (foo < 0) fooFixed = fooFixed | (1 << 31);
//you'll also need to explicitly check for overflows and underflows and handle them however is appropriate to your situation

Edit: corrected & to |

Alan
As Alan has shown, fixed point values can be converted to and from floating point values by multiplying or dividing by the unit value. This format throws a small twist by specifying a sign bit.
DominicMcDonnell
You should be using `long` rather than `int` - the former has at least 32 bits, whereas the latter is only guaranteed to have 16.
caf
Sorry, nope. Close, but nope. The representation explicitly says it is 2's complement. You don't get that by taking the absolute value. Further, your attempt to use a bit-wise AND operator to set the sign bit will clear every bit *except* the sign bit.
RBerteig
The original question specified an explicit sign bit and not two's complement. But you're right about the incorrect bitwise operator, that was a mistake.
Alan
The question sort of does, but the specification it references doesn't, and the sample values clearly aren't signed magnitude. The question misunderstood the spec. Further, if you want to convert to a signed magnitude form, then you might also need to guarantee that overflow in the multiply doesn't set the sign bit for an out of range positive number.
RBerteig
Thanks for the answer. This definitely helped me understand what's going on here. @Alan, you're correct in suggesting that I shouldn't get too carried away with the representation of a float. Thanks for the clarification.
jonc
+1  A: 

Assuming your C environment does 2's complement integers, then this is much simpler than it seems.

typedef long s1516;  // 32bit 2's complement signed integer
s1516 floattos1516(double f) {
    return (s1516)(f * 65536. + 0.5);
}

The representation is a fixed point value, with 16 bits of fraction. That is the same as a rational number whose denominator is always 65536 (or 216). To form such a rational from a floating point value, you just multiply by the denominator. Then it is just a matter of an appropriate rounding, and a truncation to the integral type.

The standard picked the form they did because this just works if your system uses 2's complement integer arithmetic. Although it is true that the leftmost bit does represent the sign, it is not a sign bit in the sense that is used in a floating point representation.

If your calculations are truly float rather than double, you will find that you don't have as much precision in your calculation as is available in the fixed point value for numbers near full scale. If you calculate in double, then you will always have more precision in your calculation than in the result.

Edit:

The apparently latest spec is available from the ICC as Specification ICC.1:2004-10 (Profile version 4.2.0.0). Section 5.1.3:

5.1.3 s15Fixed16Number

A fixed signed 4-byte/32-bit quantity which has 16 fractional bits as shown in table 3.

Table 3 — s15Fixed16Number
  Number               Encoding
-32768,0               80000000h
     0                 00000000h
     1,0               00010000h
 32767 + (65535/65536) 7FFFFFFFh

Aside from localized preference for the representation of a decimal point, these values are completely consistent with my understanding that the representation is simply signed 2's complement integers that should be divided by 65536 to get their values.

The natural conversion to the representation is simply to multiply by 65536, and from it simply to divide. Picking a suitable rounding rule is a matter of preference.

The full scale range is from -32768.0 (0x80000000) to approximately 32767.9999847412 (0x7fffffff), inclusive.

I would agree that it would be clearer if the specification had happened to show the representation in hex of any negative values. I skimmed the entire document, and the only values I found represented in both decimal and hex were CIE XYZ chromaticity coordinates, which by definition range from 0 to 1, and hence don't help as exemplar negative values.

RBerteig
Your code lacks the error checking to spot range problems when plugged into my test framework (as an extra column in the output). Also, the result is badly wrong when the input is negative (giving 0xFFFF0001 for -1.0). The rounding with +0.5 causes slight deviations from my answers, but yours may be better than mine because of it.
Jonathan Leffler
-1.0 would be exactly 0xFFFF0000 assuming the encoding is what I understood it to be. Rounding up might not be the best answer, however. I should flag this particular fragment as untested, but I use fragments just like it to convert from floating point to fixed point regularly.
RBerteig
@Jonathon, I think you are over thinking the spec. As I read it, it can only have the simple and natural meaning, and the quoted sample values are consistent.
RBerteig
Thanks RBertig. I selected this answer because I realized you are probably correct in you're interpretation that the spec is talking about a Fixed point 2's compliment number. I will probably use a double to make sure I get the precision I need, and maybe add some error checking.
jonc