views:

282

answers:

5

Purpose

I am writing a small library for a larger project which supplies malloc/realloc/free wrapper-functions as well as a function which can tell you whether or not its parameter (of type void *) corresponds to live (not yet freed) memory allocated and managed by the library's wrapper-functions. Let's refer to this function as isgood_memory.

Internally, the library maintains a hash-table to ensure that the search performed by isgood_memory is reasonably fast. The hash-table maintains pointer values (elements of type void *) to make the search possible. Clearly, values are added and removed from the hash-table to keep it up-to-date with what has been allocated and what has been freed, respectively.

The portability of the library is my biggest concern. It has been designed to assume only a mostly-compliant C90 (ISO/IEC 9899:1990) environment... nothing more.

Question

Since portability is my biggest concern, I couldn't assume that sizeof(void *) == sizeof(X) for the hash-function. Therefore, I have resorted to treating the value byte-by-byte as if it were a string. To accomplish this, the hash function looks a little like:

static size_t hashit(void *ptrval)
{
    size_t i = 0, h = 0;
    union {
        void *ptrval;
        unsigned char string[sizeof(void *)];
    } ptrstr;

    ptrstr.ptrval = ptrval;

    for (; i < sizeof(void *); ++i) {
        size_t byte = ptrstr.string[i];

        /* Crazy operations here... */
    }

    return (h);
}

What portability concerns do any of you have with this particular fragment? Will I encounter any funky alignment issues by accessing ptrval byte-by-byte?

+2  A: 

Looks pretty clean. If you can rely on the <inttypes.h> header from C99 (it is often available elsewhere), then consider using uintptr_t - but if you want to hash the value byte-wise, you end up breaking things down to bytes and there is no real advantage to it.

Jonathan Leffler
The C99 header file that defines uintptr_t is <stdint.h>, not <inttypes.h>; it just happens that <inttypes.h> is specified to include <stdint.h>. <inttypes.h> defines the type imaxdiv_t and several macros for format specifiers to be used with printf/scanf for the various integer types from <stdint.h>.
Adam Rosenfield
It is defined that <inttypes.h> includes <stdint.h>, and it is far more likely that you have <inttypes.h> without <stdint.h> than that you have <stdint.h> and not <inttypes.h>. But yes, you are correct, if you have just <stdint.h>, then you don't need <inttypes.h>.
Jonathan Leffler
+2  A: 

Mostly correct. There's one potential problem, though. you assign

size_t byte = ptrstr.string[i];

*string is defined as char, not unsigned char. On the platform that has signed chars and unsigned size_t, it will give you result that you may or may not expect. Just change your char to unsigned char, that will be cleaner.

Igor Krivokon
Thank you for the suggestion. It's funny...I actually have unsigned char in my code, but left out the unsigned when I typed it here. I corrected the above code-fragment.
Anthony Cuozzo
+2  A: 

You are allowed to access a data type as an array of unsigned char, as you do here. The major portability issue that I see could occur on platforms where the bit-pattern identifying a particular location is not unique - in that case, you might get pointers that compare equal hashing to different locations because the bit patterns were different.

Why could they be different? Well, for one thing, most C data types are allowed to contain padding bits that don't participate in the value. A platform where pointers contained such padding bits could have two pointers that differed only in the padding bits point to the same location. (For example, the OS might use some pointer bits to indicate capabilities of the pointer, not just physical address.) Another example is the far memory model from the early days of DOS, where far pointers consisted of segment:offset, and the adjacent segments overlapped, so that segment:offset could point to the same location as segment+1:offset-x.

All that said, on most platforms in common use today, the bit pattern pointing to a given location is indeed unique. So your code will be widely portable, even though it is unlikely to be strictly conforming.

dewtell
Is it possible for me to compile a reasonably-sized list of such platforms? Are there any defining characteristics of them that would (for instance) help me with a search on google?
Anthony Cuozzo
Do you want a list of platforms where pointer bit patterns are or are not unique? I don't think there are common terms one could use to search, but in general, mainstream workstation-like platforms (running OSs like Windows, Solaris, HP-UX, Linux) tend to have flat address spaces and unique bit patterns. In the embedded world, or with experimental machines, you may get more custom hardware, and things can be different. The C standard allows for such platforms, but I don't know how many such are in use today.
dewtell
(This might be a stupid question, but I need to be sure.) Let's assume that we have two pointer-values (both of type void *), named A and B. Can there ever be a case where A and B have the same underlying bit-pattern, but actually refer to different locations in memory?
Anthony Cuozzo
If you need an absolutely definitive answer, I'm not the right person to ask. You could ask on the newsgroup comp.std.c, where there are a lot more language lawyer types, some of whom served on the committee that drafted the standard. But I'm confident the answer is no, and the answer is going to be based on the part of the standard that defines the values of objects. You can have bits of the object that don't participate in the value, but I don't think there's any basis for a value that requires more than the bits. Too many things like memcpy would break if this was false.
dewtell
+1  A: 

If you don't need the pointer values for some other reason beside keeping track of allocated memory, why not get rid of the hash table altogether and just store a magic number along with the memory allocated as in the example below. The magic number being present alongside the memory allocated indicates that it is still "alive". When freeing the memory you clear the stored magic number before freeing the memory.

#pragma pack(1)
struct sMemHdl
{
   int magic;
   byte firstByte;
};
#pragma pack()

#define MAGIC 0xDEADDEAD
#define MAGIC_SIZE sizeof(((struct sMemHdl *)0)->magic)

void *get_memory( size_t request )
{
   struct sMemHdl *pMemHdl = (struct sMemHdl *)malloc(MAGIC_SIZE + request);
   pMemHdl->magic = MAGIC;
   return (void *)&pMemHdl->firstByte;
}

void free_memory ( void *mem )
{
   if ( isgood_memory(mem) != 0 )
   {
      struct sMemHdl *pMemHdl = (struct sMemHdl *)((byte *)mem - MAGIC_SIZE);
      pMemHdl->magic = 0;
      free(pMemHdl);
   }
}

int isgood_memory ( void *Mem )
{
   struct sMemHdl *pMemHdl = (struct sMemHdl *)((byte *)Mem - MAGIC_SIZE);
   if ( pMemHdl->magic == MAGIC )
   {
      return 1; /* mem is good */
   }
   else
   {
      return 0; /* mem already freed */
   }
}

This may be a bit hackish, but I guess I'm in a hackish mood...

Robert
+1  A: 

Accessing variables such integers or pointers as chars or unsigned chars in not a problem from a portability view. But the reverse is not true, because it is hardware dependent. I have one question, why are you hashing a pointer as a string instead of using the pointer itself as a hash value ( using uintptr_t) ?

bill
Is uintptr_t available in C90 (ISO/IEC 9899:1990)?
Anthony Cuozzo
uintptr_t isn't part of standard C90, but is supported by many compilers and is defined in stdint.h
bill
The larger project is pretty strict about conforming to the C90 standard, so I am left with few other options.
Anthony Cuozzo
There is always a facility to assign/cast a pointer to an integer.
bill