mmap big endian vs. little endian

If you're using mmap, your probably concerned about speed and efficiency. You basically have a few choices.

Wrap all your reads and writes with htonl, htons, ntohl, ntohs functions. Calling htonl (host to network) order on Windows will convert the data from little endian to big endian. On other architectures it will be a noop. These conversions do have an overhead, but depending on your operations, they may or may not be significant. AFAIK, this is the approach used by SQLite
Your other option is to always write data in host format, and provide routines if users need to migrate their data across platforms. Databases usually read and write data in host format, but provide tools like bcp which will write to either ASCII or network byte order.
You can tag the header of your file with a byte order mark. When your program starts, it will compare it's byte order with that of the file, and provide any translation if needed. This is often good for simply data formats like UTF-16, but not for formats where you have a number of variable length types.

Additionally, if you do things like provide length prefixes, or file offsets, you may have a mixture of 32 bit and 64 bit pointers. A 32 bit platform can't create a mmap view larger than 4GB, so it's unlikely that you would support file sizes larger than 4 GB. Programs like rrdtool take this approach, and support much larger file sizes on 64 bit platforms. This means your binary file wouldn't be compatible across platforms if you used the platform pointer size inside of your file.

My recommendation is to ignore all byte order issues up front, and design the system to run fast on your platform. If/when you need to move your data to another platform, then choose the easiest/quickest/most appropriate method of doing so. If you start out by trying to create a platform independent data format, you will generally make mistakes, and have to go back and fix those mistakes later. This is especially problematic when 99% of the data is in the correct byte order, and 1% of it is wrong. This means fixing bugs in your data translation code will break existing clients on all platforms.

You'll want to have a multi-platform test setup before writing code to support more than one platform.

ansaurus

tags:

views:

answers:

mmap big endian vs. little endian

related questions