views:

238

answers:

7

Hi folks!

What's the best way to convert an "uint8_t" to an "sint8_t" in portable C.

That's the code I came up with ....

#include <stdint.h>

sint8_t DESER_SINT8(uint8_t x)
(
  return
     (sint8_t)((x >= (1u << 8u))
               ? -(UINT8_MAX - x)
               : x);
)

Is there a better/simpler way to do it? Maybe a way without using a conditional?

Edit: Thanks guys. So to sum up, what I learned already ...

  • sint8_t is really called int8_t
  • 128 is expressed by 1 << 7 and not by 1 << 8
  • 2s complement is "negating off by one"

:)

So here is an updated version of my original code:

#include <stdint.h>

int8_t DESER_INT8(uint8_t x)
(
  return ((x >= (1 << 7))
          ? -(UINT8_MAX - x + 1)
          : x);
)
A: 

assuming the types sint8_t and uint8_t are assignment compatible, this works

sint8_t DESER_SINT8(uint8_t x) { return x; }
pmg
Take a look at his code, pmg, he's doing something different. He doesn't want a cast.
Santiago Lezica
This runs afoul of rule `[conv.int]` in the standard, which says "If the destination type is signed, the value is unchanged if it can be represented in the destination type (andbit-field width); otherwise, the value is implementation-defined."
Ben Voigt
+11  A: 

1u << 8u is 0x100u, which is larger than every uint8_t value, so the conditional is never satisfied. Your "conversion" routine is actually just:

return x;

which actually makes some sense.

You need to define more clearly what you want for conversion. C99 defines conversion from unsigned to signed integer types as follows (§6.3.1.3 "Signed and unsigned integers")

When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Thus, uint8_t values between 0 and 127 are preserved, and the behavior for values larger than 127 is undefined. Many (but not all) implementations will simply interpret the unsigned values as a twos-complement representation of a signed integer. Perhaps what you're really asking is how to guarantee this behavior across platforms?

If so, you can use:

return x < 128 ? x : x - 256;

The value x - 256 is an int, guaranteed to have the value of x interpreted as a twos-complement 8 bit integer. The implicit conversion to int8_t then preserves this value.

This all assumes that sint8_t is meant to be int8_t, as sint8_t isn't a standard type. If it isn't, then all bets are off, because the correctness of the conversion I suggested depends on the guarantee that int8_t have a twos-complement representation (§7.18.1.1 "Exact-width integer types").

If sint8_t is instead some wacky platform-specific type, it might use some other representation like ones-complement, which has a different set of representable values, thus rendering the conversion described above implementation-defined (hence non-portable) for certain inputs.


EDIT

Alf has argued that this is "silly", and that this will never be necessary on any production system. I disagree, but it is admittedly a corner case of a corner case. His argument is not entirely without merit.

His claim that this is "inefficient" and should therefore be avoided, however, is baseless. A reasonable optimizing compiler will optimize this away on platforms where it is unnecessary. Using GCC on x86_64 for example:

#include <stdint.h>

int8_t alf(uint8_t x) {
    return x;
}

int8_t steve(uint8_t x) {
    return x < 128 ? x : x - 256;
}

int8_t david(uint8_t x) {
    return (x ^ 0x80) - 0x80;
}

compiled with -Os -fomit-frame-pointer yields the following:

_alf:
0000000000000000    movsbl  %dil,%eax
0000000000000004    ret
_steve:
0000000000000005    movsbl  %dil,%eax
0000000000000009    ret
_david:
000000000000000a    movsbl  %dil,%eax
000000000000000e    ret

Note that all three implementations are identical after optimization. Clang/LLVM gives exactly the same result. Similarly, if we build for ARM instead of x86:

_alf:
00000000        b240    sxtb    r0, r0
00000002        4770    bx  lr
_steve:
00000004        b240    sxtb    r0, r0
00000006        4770    bx  lr
_david:
00000008        b240    sxtb    r0, r0
0000000a        4770    bx  lr

Protecting your implementation against corner cases when it has no cost for the "usual" case is never "silly".

To the argument that this adds needless complexity, I say: which is harder -- writing a comment to explain the conversion and why it is there, or your successor's intern trying to debug the problem 10 years from now when a new compiler breaks the lucky happenstance that you've been silently depending on all this time? Is the following really so hard to maintain?

// The C99 standard does not guarantee the behavior of conversion
// from uint8_t to int8_t when the value to be converted is larger
// than 127.  This function implements a conversion that is
// guaranteed to wrap as though the unsigned value were simply
// reinterpreted as a twos-complement value.  With most compilers
// on most systems, it will be optimized away entirely.
int8_t safeConvert(uint8_t x) {
    return x < 128 ? x : x - 256;
}

When all is said and done, I agree that this is vaguely over the top, but I also think we should try to answer the question at face value. A better solution, of course, would be for the C standard to pin down the behavior of conversions from unsigned to signed when the signed type is a twos-complement integer without padding (thus specifying the behavior for all of the intN_t types).

Stephen Canon
@Stephen: no it is not *undefined*, but implementation defined or an implementation defined signal is raised. Also, I think you did inverse the two cases, don't you?
Jens Gustedt
@Jens: I mean to get at the point that the behavior is not defined by the standard, but you're right that I was a little sloppy. Which two cases do you have in mind?
Stephen Canon
`127` is a valid value and `-129` is not. So `x < 128? x : x - 256`.
Potatoswatter
@Potatoswatter, @Jens: Ah, yes. It's not that they're inverted, just that I typo'd the condition. Fixed.
Stephen Canon
Yeah, sorry, I should have at least "compile check" my code. I meant to write `int8_t` instead of `sint8_t`. And, as you wrote, `1 << 8` is just plain wrong. I meant `1 << 7`, here.So I a sense, what I meant to write in the first place was `return x >= 128 ? -(255 - x) : x` ... Thanks for clarifying this.My second question still stands, though.
heckenpenner_rot
@heckenpenner: -(255-x) = x-255, not x-256. And see my answer for the bit-twiddling, non-branching trick.
Potatoswatter
@Stephen: See Ben's comment and my response on my answer. `x - 256` is still of unsigned type; subtracting 256 from a uint8_t is a simple no-op, and to cast before subtracting just goes back to square one…
Potatoswatter
@Stephen: sorry, no it was me mixing the cases up.
Jens Gustedt
"Integer promotion" takes place! So everything works out ...
heckenpenner_rot
Hm, I answered this q earlier, but since this particular slightly ungood answer is voted up so much: the conditional is is needlessly verbose and needlessly inefficient. Just do `int8_t(x)`. The `int8_t` type is by definition two's complement; none of the bit-fiddling is necessary.
Alf P. Steinbach
@Potatoswatter: no, the type of `x-256` is `int`.
Steve Jessop
@Alf P. Steinbach: That's incorrect. `int8_t` is necessarily twos-complement, but the conversion from `uint8_t` to `int8_t` is implementation-defined for values larger than `127` (see the sections of the standard that I quoted). A conforming implementation could have `(int8_t)x = x > 127 ? 127 : x`, for example.
Stephen Canon
@Potatoswatter: The literal `256` has type `int`, so subtracting it from a `uint8_t` gives an `int` result (§6.3.1.8 "Usual arithmetic conversions") and is *not* a no-op.
Stephen Canon
@Stephen: not sure what your "that, that supposedly is incorrect, refers to. The rest of your comment is formally correct, but quite irrelevant. No such implementation exists, or would be used if it was created. Coding for zero-probability hypotheticals wastes not just your own time but also the time of those who are to maintain the code. So, not only is none of the bit-fiddling necessary: it has negative payoff.
Alf P. Steinbach
@Alf: Hypothesizing that no such implementation exists is irrelevant. If a platform comes around that lacks native 8-bit arithmetic, and on which clamping happens to be faster than sign-extension (not at all preposterous, especially on some vector architectures), there easily could be an implementation with these properties (I wouldn't be surprised if one already exists). More to the point: you don't get to change the question when you answer it. The question, as asked, is how to do it in a way that is *guaranteed* by the standard.
Stephen Canon
@Stephen: no, and no, and no, sorry. First, it's your hypothetical implementation that is hypothetical, not my statement that no such implementation exists. Secondly, if the types are offered then they must have the properties required of them. And third, the question was "in portable C", not "guaranteed by the standard". Focusing on all that's not formally guaranteed, no C code is guaranteed to have any effect whatsoever. So it's just silly to focus on such things.
Alf P. Steinbach
@Alf: in my hypothetical scenario, the types **do** have the properties required of them. `uint8_t` is an unsigned 8 bit type. `int8_t` is a twos-complement 8 bit type. However, conversions from `uint8_t` to `int8_t` happen to saturate instead of wrap. No part of this violates the standard, and no part of this is so unlikely that you can dismiss it as "silly".
Stephen Canon
More to the point, not only does the solution I suggest guarantee that the conversion is done properly (even on "silly" implementations), but it gets optimized away by reasonable compilers on platforms where it is unnecessary. All three compilers that I have tested were successful in optimizing it away.
Stephen Canon
A: 

Uhm,... I think you were trying to return x, if x could be represented in sint8, or abs(SINT8_MAX - x) if not, right?

In that case, here's one that works (yours had a tiny error I think):

#define HIGHBIT(X) ((X) & (1 << (sizeof(X) * 8 - 1)))

char utos8(unsigned char ux)
{
    return HIGHBIT(ux) ? -ux : ux;
}

Note that using that code, you can convert from any unsigned to signed type wrapping the HIGHTBIT macro in a function.

Hope that helps.

Santiago Lezica
… assuming that `CHAR_BIT == 8` on your target platform.
Stephen Canon
n/m stupid comment :)
Torlack
no, for general integer types there is no such function, simply because the amount unsigned and signed values may just not be the same. This only works here because you can assume that it is two's complement *and* that there is no trap representation.
Jens Gustedt
What @Jens said. It does work for all the C99 fixed-width integer types, however.
Stephen Canon
@Stephen: if standard `uint8_t` exists on the implementation, then `CHAR_BIT` is 8. This is because `uint8_t` is defined to have 8 bit width and no padding bits. `CHAR_BIT` therefore must divide 8, and it can't be less than 7, hence is exactly 8.
Steve Jessop
@Steve: Right; I was referring to the "any unsigned to signed type" comment. Sorry for being unclear.
Stephen Canon
@Steve: Actually C requires `CHAR_BIT>=8`, not just `>=7`. So divisibility is not necessary; it's just a squeeze `8<=CHAR_BIT<=8` implying `CHAR_BIT==8`.
R..
+4  A: 

Conversion of uint8_t to int8_t essentially reverses the order of the two half-ranges. "High" numbers become "low." This can be accomplished with XOR.

x ^ 0x80

However, all the numbers are still positive. That's no good. We need to introduce the proper sign and restore the proper magnitude.

return ( x ^ 0x80 ) - 0x80;

There you go!

Potatoswatter
Except that the cast introduces implementation-defined behavior. This code is no more portable than a simple cast.
Ben Voigt
@Ben: Ah, fixed. Actually this problem exists more subtly in other answers. For example, Stephen's answer does `return x - 256;` where `x` is unsigned, and OP uses `-(UINT8_MAX - x)` which is still of unsigned type.
Potatoswatter
"Integer promotion" takes place in my answer!
heckenpenner_rot
You had it almost right initially: `return (x ^ 0x80) - 0x80;` is correct without any explicit casts. The "usual arithmetic conversions" and the implicit cast on the return do the right thing.
Stephen Canon
@Stephen, Ah, right, `UINT8_MAX` is not of type `uint8`, and `x ^ 0x80` promotes `x` to `int`! Of course, this all assumes that `int` and `signed char` are different types.
Potatoswatter
@Potatoswatter: I'm not sure how `signed char` comes into it; `int` is guaranteed not to be the same as `int8_t`, fwiw.
Stephen Canon
@Stephen: yeah, I just have that in mind from a discussion a few days ago… dealing with noncompliant or borderline C99 implementations.
Potatoswatter
(Note that the comparative efficiency of my condition vs your bit-twiddling is basically a non-issue; both are optimized away entirely on most platforms)
Stephen Canon
@Stephen: Probably; you proved yours on one popular platform, anyway. I suspect your version is easier to optimize; the promotion by XOR makes this harder to analyze.
Potatoswatter
@Potatoswatter: Ah, my nasty question. ;-) Well `INT_MAX` must be at least 32767, so even without the additional requirements such as lack of padding bits in exact-size integer types, `int` must be able to hold all values of `uint8_t`.
R..
@R..: Yeah, it's really moot here… I was thinking more along the lines of `int` getting promoted to `unsigned int`.
Potatoswatter
A: 

Assuming that you sint8_t is really int8_t from <stdint.h>, then it's guaranteed two's complement form, and it's guaranteed that there are no padding bits.

Assuming further that you want the opposite (implicit) conversion to work and yield the original value.

Then, given a value v of type uint8_t, all you have to do is ...

    int8_t( v )

That's it.

The C standard does AFAIK not guarantee this conversion, only the opposite conversion. However, there is no known system or compiler where it won't work (given that you have these types available).

Forget all the manual bit-fiddling. Or, to test whether you're doing it right, convert back by just assigning the value to an uint8_t and check whether you get the original value for all cases. In particular, the formula you used yields -((2^n-1)-x) = 1+x-2^n, while the correct conversion for value preservation is x-2^n.

Cheers & hth.,

– Alf

Alf P. Steinbach
The question is how to do this "in portable C". I think that means, "in a way guaranteed by the standard to work on all conforming implementations". This includes hypothetical, not-yet-written, ultra-pedantic implementations which deliberately check the value of any unsigned value converted to signed type, and raise a signal (which, perhaps, aborts the program) if it's out of range.
Steve Jessop
@Steve: there is no such implementation and will never be any such, and it's about as relevant to worry about as a C++ implementation that has 1 GiB `bool`. We don't write workarounds for that. Keep in mind that `int8_t` is two's complement by C99 definition. And keep in mind that the reason for the requirement of two's complement form is precisely to support this conversion. ;-)
Alf P. Steinbach
@Alf: The fact that "all sane implementations" use the behavior is beside the point (even if it were true). It's not guaranteed by the standard. The questioner is asking how to do it in a way that the behavior *is* guaranteed.
Stephen Canon
"the reason for the requirement of two's complement form is precisely to support this conversion" - that may well be true, but why doesn't the standard just define the conversion from `uint8_t` to `int8_t`, in 7.18? Save everyone a lot of bother. Come to think of it, since the representation is defined I suspect `memcpy(-)
Steve Jessop
Btw, if I hack the gcc code to produce an implementation which does insert checking code into unsigned->signed conversions, do I get any kind of prize? I doubt it's all that difficult (well, if you're familiar with gcc or maybe just with gcc back-ends, which I'm not), so your claim that no such implementation will ever exist sounds excessive. No such implementation will ever be used for production code, then you're on to something!
Steve Jessop
@Steve: I'm pretty sure the `memcpy` solution is valid, as is my type-punning-based solution I gave as an answer.
R..
-1 to Alf for writing code that's not valid C (I assume it's some ugly C++ cast notation..?) in the middle of talking about what the C standard requires.
R..
@R: The question was tagged C++. Sorry, I didn't notice the request for C. :-) The relevant guarantees etc. are, however, in C99 (referred to by C++0x).
Alf P. Steinbach
A: 

If you want to avoid the branch you can always do something insane like this:

int selector= 127 - x; // 0 or positive if x <=127, negative otherwise
int selector>>= 8; // arithmetic rotate to get -1 or 0
int wrapped_value= x - 256;

return (x&~selector)|(wrapped_value&selector); // if selector is 0, use x, otherwise, use the wrapped value.
MSN
Potatoswatter already has an answer with no branch, which is nowhere near this complicated.
Ben Voigt
A: 

I don't know if this has any practical value, but here's a different approach that came to mind:

uint8_t input;
int8_t output;
*(uint8_t *)&output = input;

Note that:

  • int8_t is required to be twos complement.
  • Corresponding signed and unsigned types are required to have the same representation for the overlapping part of their ranges, so that a value that's in the range of both the signed and unsigned type can be accessed through either type of pointer.
  • That leaves only one bit, which must be the twos complement sign bit.

The only way I can see that this reasoning might fail to be valid is if CHAR_BIT>8 and the 8-bit integer types are extended integer types with trap bits that somehow flag whether the value is signed or unsigned. However, the following analogous code using char types explicitly could never fail:

unsigned char input;
signed char output;
*(unsigned char *)output = input;

because char types cannot have padding/trap bits.

A potential variant would be:

return ((union { uint8_t u; int8_t s; }){ input }).s;

or for char types:

return ((union { unsigned char u; signed char s; }){ input }).s;

Edit: As Steve Jessop pointed out in another answer, int8_t and uint8_t are required not to have padding bits if they exist, so their existence implies CHAR_BIT==8. So I'm confident that this approach is valid. With that said, I would still never use uint8_t and always explicitly use unsigned char, in case the implementation implements uint8_t as an equal-size extended integer type, because char types have special privileges with respect to aliasing rules and type punning which make them more desirable.

R..
Out of interest, since the question is also tagged C++: type-punning is explicitly permitted in C99, but what about C++?
Steve Jessop
I never write C++ code and know very little about the technical details of C++. +1 to your comment for interesting question relevant to OP's question.
R..