views:

618

answers:

8

How can one portably perform pointer arithmetic with single byte precision?

Keep in mind that:

  • char is not 1 byte on all platforms
  • sizeof(void) == 1 is only available as an extension in GCC
  • While some platforms may have pointer deref pointer alignment restrictions, arithmetic may still require a finer granularity than the size of the smallest fundamental POD type
A: 

The C99 standard defines the uint8_t that is one byte long. If the compiler doesn't support this type, you could define it using a typedef. Of course you would need a different definition, depending on the the platform and/or compiler. Bundle everything in a header file and use it everywhere.

kgiannakakis
"If the compiler doesn't support this type, you could define it using a typedef". Actually you can't. If the compiler has a type that provides the behaviour of uint8_t, then it must define uint8_t in stdint.h. So if it doesn't define it, it follows that there's nothing you could typedef it to yourself that would have the correct semantics. You might be able to get close, though, for example if the implementation had an 8 bit type with padding bits. Assuming a C99 compiler, that is.
Steve Jessop
What about non C99 compilers? Usually 'typedef unsigned char uint8_t;' will give you a byte wide type. Is there something more to uint8_t semantics than being an 8-bit data type?
kgiannakakis
C89 compilers won't necessarily have stdint.h at all, so you can't assume that if they can implement uint8_t, then they will. Actually, I think I was wrong, there aren't any additional requirements for uint8_t, so you won't ever be "close but not quite there". It's int8_t that has the extra requirements: must be 2's complement and have no padding bits. So bad example there on my part, I'll try again: if for instance your compiler has a 16 bit char, then there may not be any 8 bit types at all, and hence nothing you can use as uint8_t. Code that relies on it is not completely portable.
Steve Jessop
A: 

You could reinterpret_cast the pointer to an unsigned integer type, with the only assumption that the target machine uses byte addressing.

Shmoopty
+3  A: 

sizeof(char) is guaranteed to be 1 by the C standard. Even if char uses 9 bits or more.

So you can do:

type *pt;
unsigned char *pc = (unsigned char *)pt;

And use pc for arithmetic. Assigning pc to pt by using the cast above is undefined behavior by the C standard though.

If char is more than 8-bits wide, you can't do byte-precision pointer arithmetic in portable (ANSI/ISO) C. Here, by byte, I mean 8 bits. This is because the fundamental type itself is bigger than 8 bits.

Alok
It's actually perfectly permissible under the standard to examine any object as if it were an array of `char`, `unsigned char` or `signed char`. There's several guarantees made in order to ensure this is allowed - like the fact that `char` may not have padding bits.
caf
@caf: Only `unsigned char` type is guaranteed to have no padding bits. `signed char` type can have padding bits (and trap representations). Langauge specification allows reinterpreting objects as arrays of `signed char`, but it is your responsibility to ensure somehow that you won't hit a trap representation for `signed char`. If you really want to be sure of safe reinterpretation, always use an array of `unsigned char`.
AndreyT
You're right of course. I should have been more careful before claiming undefined behavior. Thanks for correcting me.
Alok
@AndreyT: I think caf was talking about *pointers*: My copy of the standard says (section 3.2.2.2):*A pointer to a non-qualified type may be converted to a pointer to the qualified version of the type; the values stored in theoriginal and converted pointers shall compare equal.*
Alok
@Alok: No, it is prefectly clear that caf is talking about reinterpreting any object as an array of [signed/unsigned] char objects.
AndreyT
@AndreyT: I think I need some sleep. If you're actually looking at the values (which is the point I think), then you should use unsigned char. Thanks for correcting me.
Alok
Yes, you're right that `signed char` and `char` can possibly have padding bits - I was wrong about that. I'm *not* so sure about the trap representations, though - the relevant text (in 6.2.6.1 p5) says that a trap representation accessed through an lvalue "that does not have character type" causes undefined behaviour, implying that *if* `char` and `signed char` can have trap representations, then accessing them is not undefined behaviour, which seems a little odd.
caf
Good grief, signed char can have padding bits? So for instance `unsigned char` might be a 9 bit unsigned integer 0 - 512, and `char` an 8 bit signed integer -128 - 127. C++ forbids this: yet another arbitrary difference between the two, but I can see why...
Steve Jessop
I could be wrong about `signed char` having trap representations. I see that it can have padding bits, since the standard is quite specific about only `unsigned char` not having padding bits. As for trap representations - I'm not sure. Note, BTW, that even while C++ says that `signed char` has no padding bits, at the same time it doesn't guarantee that all combinations of bits "represent numbers". Isn't this supposed to mean that `signed char` can have trap representations even in C++?
AndreyT
+1  A: 

Cast the pointer to a uintptr_t. This will be an unsigned integer that is the size of a pointer. Now do your arithmetic on it, then cast the result back to a pointer of the type you want to dereference.

(Note that intptr_t is signed, which is usually NOT what you want! It's safer to stick to uintptr_t unless you have a good reason not to!)

Vincent Gable
i've always found `intptr_t` interesting, where does it live in the standard?
Matt Joiner
7.18.1.4 of this draft C99 standard: http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf. It's an optional type in stdint.h.
Steve Jessop
+5  A: 

sizeof(char) always returns 1, in both C and C++. A char is always one byte long.

Tim Robinson
great links, thanks
Matt Joiner
+2  A: 

According to the standard char is the smallest addressable chunk of data. You just can't address with greater precision - you would need to do packing/unpacking manually.

sharptooth
+15  A: 

Your assumption is flawed - sizeof(char) is defined to be 1 everywhere.

From the C99 standard (TC3), in section 6.5.3.4 ("The sizeof operator"):

(paragraph 2)

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type.

(paragraph 3)

When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

When these are taken together, it becomes clear that in C, whatever size a char is, that size is a "byte" (even if that's more than 8 bits, on some given platform).

A char is therefore the smallest addressable type. If you need to address in units smaller than a char, your only choice is to read a char at a time and use bitwise operators to mask out the parts of the char that you want.

caf
The OP didn't say that 'sizeof(char)' can be different from 1. I believe, the OP specifically avoided that wording to make people understand that "not 1 byte on all platforms" means "not 1 *machine* byte". This is prefectly possible, even though at the language level `sizeof(char)` would still always be 1.
AndreyT
@caf i'll accept this as the answer if you can provide links to those sections of the C99 standard, thanks
Matt Joiner
AndreyT: Look at the question title.
caf
@Anacrolix, he provided a "link": Section 6.5.3.4. So you're trying to write portable programs in C without having the current ANSI C Standard available, not even a free draft?
Secure
@caf: OK, thanks. I missed the wording in the title.
AndreyT
no the link is for others, and for reference, it will make it a better answer.
Matt Joiner
Anacrolix, I'd be inclined to accept caf's answer with its excerpts from the standard - the C99 standard trumps all other documents on this sort of question.
Tim Robinson
Oh, you meant a *hyperlink* - well, why didn't you say? ;) I've added a hyperlink to a PDF of the TC3 draft.
caf
Actually a byte is @AndreyT, a `char` is specified to be 1-byte on all platforms. The size of this byte, in bits, is implementation defined. However, it will always be at least 8-bits.
Joe D
@Secure: Who said anything about portability?
Matt Joiner
@Matt Joiner: Erm... You in your own original question? You've even tagged it with "portability".
Secure
A: 

I don't understand what you are trying to say with sizeof(void) being 1 in GCC. While type char might theoretically consist of more than 1 underlying machine byte, in C language sizeof(char) is 1 and always exactly 1. In other words, from the point of view of C language, char is always 1 "byte" (C-byte, not machine byte). Once you understand that, you'd also understand that sizeof(void) being 1 in GCC does not help you in any way. In GCC the pointer arithmetic on void * pointers works in exactly the same way as pointer arithmetic on char * pointers, which means that if on some platform char * doesn't work for you, then void * won't work for you either.

If on some platform char objects consist of multiple machine bytes, the only way to access smaller units of memory than a full char object would be to use bitwise operations to "extract" and "modify" the required portions of a complete char object. C language offers no way to directly address anything smaller than char. Once again char is always a C-byte.

AndreyT