views:

374

answers:

5

What are the underlying transformations that are necessary to convert data in a little-endian system into network byte order? For 2 byte and 4 byte data there are well-known functions (such as htons, ntohl, etc.) to encapsulate the changes, what happens for strings of 1 byte data (if anything)?

Also, Wikipedia implies that little-endian is the mirror image of big-endian, but if that were true why would we need specific handling for 2 and 4 byte data?

The essay "On Holy Wars and a Plea for Peace" seems to imply that there are many different flavors of little-endian -- it's an old essay -- does that still apply? Are byte order markers like the ones found at the beginning of Java class files still necessary?

And finally, is 4-byte alignment necessary for network-byte order?

Paul.

+5  A: 

Let's say you have the ASCII text "BigE" in an array b of bytes.

b[0] == 'B'
b[1] == 'i'
b[2] == 'g'
b[3] == 'E'

This is network order for the string as well.

If it was treated as a 32 bit integer, it would be

'B' + ('i' << 8) + ('g' << 16) + ('E' << 24)

on a little endian platform and

'E' + ('g' << 8) + ('i' << 16) + ('B' << 24)

on a big endian platform.

If you convert each 16-bit work separately, you'd get neither of these

'i' + ('B' << 8) + ('E' << 16) + ('g' << 24)

which is why ntohl and ntohs are both required.

In other words, ntohs swaps bytes within a 16-bit short, and ntohl reverses the order of the four bytes of its 32-bit word.

Doug Currie
A: 

Specific handling functions for 2 and 4 byte data take advantage of the fact that there are processor instructions that operate on specific data sizes. Running a 1-byte reversing function four times is certainly less efficient than using wider instructions to perform the same (albeit increased in scale) operations on all four bytes at once.

Sparr
A: 

1 byte data doesn't require any conversion between endians (it's an advantage of UTF-8 over UTF-16 and UTF-32 for string encoding).

workmad3
A: 

is 4-byte alignment necessary for network-byte order?

No specific alignment is necessary for bytes going over a network. Your processor may demand a certain alignment in memory, but it's up to you to resolve the discrepancy. The x86 family usually doesn't make such demands.

Mark Ransom
A: 

The basic idea is that all multi-byte types have to have the order of their bytes reversed. A four byte integer would have bytes 0 and 3 swapped, and bytes 1 and 2 swapped. A two byte integer would have bytes 0 and 1 swapped. A one byte character does not get swapped.

There are two very important implications of this that non-practicioners and novices don't always realise:

  1. (ASCII) Character strings are not touched.
  2. There is no possible blind algorithm to byte swap generic "data". You have to know the type of all your data and swap each item in the manner required for its type.
T.E.D.