tags:

views:

451

answers:

6

After a quick scan of related questions on SO, I have deduced that there's no function that would check the amount of memory that malloc has allocated to a pointer. I'm trying to replicate some of std::string basic functionality (mainly dynamic size) using simple char*'s in C and don't want to call realloc all the time. I guess I'll need to keep track of how much memory has been allocated. In order to do that, I'm considering creating a typedef that will contain the string itself and an integer with the amount of memory currently allocated, something like this:

typedef struct {
    char * str;
    int mem;
} my_string_t;

Is that an optimal solution, or perhaps you can suggest something that will bear better results? Thanks in advance for your help.

+1  A: 

That is how it was done in the Pleistocene, and that's how you should do it today. You are dead on the money that malloc does not offer any portable, supported, mechanism to query the size of an allocated block.

bmargulies
A: 

A more common way is to wrap malloc (and realloc) and keep a list of sizes and pointers
That way you don't need to change any string functions.

Martin Beckett
+2  A: 

This is the obvious solution. And while you are at it, you might want to have a struct member that maintains the amount of allocated memory actually in use. This will avoid having to call strlen() all the time, and would enable you to support non null-terminated strings, as the C++ std::string class does.

anon
This sounds well on the way to writing one's own string library. Which I'm doing, incidentally, but I'm doing it because I sometimes enjoy writing low-level heavily-optimized code, and want it to go to a useful project. I wouldn't write a new string library for every application.
Chris Lutz
I assumed writing a string library was what the OP was asking about.
anon
@Neil Butterworth: Hey, that's a useful idea! Thank you!@Chris Lutz: Yes, I'm writing a mini library, mainly for fun, but I've a specific project that might benefit from it at some point.
mingos
+1  A: 

write wrapper functions. If you are using malloc then you should do that anyway.

For an example look in "writing solid code"

pm100
Agreed with both the comment re: wrapper functions and recommending that book.
Heath Hunnicutt
+3  A: 

You will want to allocate the space for both the length and the string in the same block of memory. This may be what you intended with your struct, but you have reserved space for only a pointer to the string.

There must be space allocated to contain the characters of the string.

For example:

typedef struct
{
    int num_chars;
    char string[];
} my_string_t;

my_string_t * alloc_my_string(char *src)
{
    my_string_t * p = NULL;
    int N_chars = strlen(src) + 1;

    p = malloc( N_chars + sizeof(my_string_t));
    if (p)
    {
         p->num_chars = N_chars;
         strcpy(p->string, src);
    }
    return p;
}

In my example, to access the pointer to your string, you address the string member of the my_string_t:

my_string_t * p = alloc_my_string("hello free store.");
printf("String of %d bytes is '%s'\n", p->num_chars, p->string);

Be careful to realize that you are obtaining the pointer for the string as a consequence of allocating space to store the characters. The resource you are allocating is the storage for the characters, the pointer obtained is a reference to the allocated storage.

In my example, the memory allocated is laid out sequentially as follows:

+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| 00 | 00 | 00 | 11 | 'h'| 'e'| 'l'| 'l'| 'o'| 20 | 'f'| 'r'| 'e'| 'e'| 20 | 's'| 't'| 'o'| 'r'| 'e'| '.'| 00 |
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
^^                   ^
||                   |
p|                   |
 p->num_chars        p->string

Notice that the value of p->string is not stored in the allocated memory, it is four bytes from the beginning of the allocated memory, immediately subsequent to the (presumed 32-bit, four-byte) integer.

Your compiler may require that you declare the flexible C array as:

typedef struct
{
    int num_chars;
    char string[0];
} my_string_t;

but the version lacking the zero is supposedly C99-compliant.

You can accomplish the equivalent thing with no array member as follows:

typedef struct
{
    int num_chars;
} mystr2;

char * str_of_mystr2(mystr2 * ms)
{
    return (char *)(ms + 1);
}

mystr2 * alloc_mystr2(char *src)
{
    mystr2* p = NULL;
    size_t N_chars = strlen(src) + 1;

    if (N_chars num_chars = (int)N_chars;
         strcpy(str_of_mystr2(p), src);
    }
    return p;
} 

printf("String of %d bytes is '%s'\n", p->num_chars, str_of_mystr2 (p));

In this second example, the value equivalent to p->string is calculated by str_of_mystr2(). It will have approximately the same value as the first example, depending on how the end of structs are packed by your compiler settings.

While some would suggest tracking the length in a size_t I would look up some old Dr. Dobb's article on why I disagree. Supporting values greater than INT_MAX is of doubtful value to your program's correctness. By using an int, you can write assert(p->num_chars >= 0); and have that test something. With an unsigned, you would write the equivalent test something like assert(p->num_chars < UINT_MAX / 2); As long as you write code which contains checks on run-time data, using a signed type can be useful.

On the other hand, if you are writing a library which handles strings in excess of UINT_MAX / 2 characters, I salute you.

Heath Hunnicutt
So, all the data contained within the struct should be within the same block of memory? Is that a necessity of some sort, or just a reasonable optimisation?
mingos
It would be acceptable to allocate the block containing the data in a separate step from the block containing the 'metadata' but I just want to be clear that in your example you would have to call malloc() twice -- once for your struct, and once for the data pointed to be the "str" member of your struct.
Heath Hunnicutt
OK, I understand. Thanks for this great reply!
mingos
My pleasure. Enjoy C. :)
Heath Hunnicutt
OK, after some fun with the compiler, I found out that the flexible array member will not be an option... I'm trying to maintain my code strict ISO C (-pedantic-errors) and this just doesn't work :(
mingos
Use `size_t` for the type of the quantity variable. Quantities are not negative, so `int` will reduce your maximum capacity by 2.
Thomas Matthews
I updated my answer for your situation. I thought that using [] was C99-compliant, but rather than mess with it, you can use the direct approach shown above.
Heath Hunnicutt
Or for C89 compilers, `string[1]` would be fine, and then you either live with overallocating slightly or will have to use the `offsetof(my_string_t, string)` instead of `sizeof(my_string_t)`.
jamesdlin
A: 

I think you could use malloc_usable_size.

3lectrologos
It isn't cited in the standards section. It seems to be a FreeBSD or Linux extension(possibly also on MacOS X), but a pretty deprecated one at best("You shouldn't use this for the only reason you might want to use this. Use it to debug, yeah..."). For reference, NetBSD doesn't appreciate it http://mail-index.netbsd.org/tech-kern/2007/07/20/0003.html. And it isn't on OpenBSD either. Windows has its own extension.
jbcreix