ansaurus

Question

Is this hack valid according to standard?

Answer 1

+6 A:

Yes, it's completely valid. And I would strongly encourage doing this when it allows you to avoid unnecessary additional allocations (and the error handling and memory fragmentation they entail). Others may have different opinions.

By the way, if your data isn't void * but something you can access directly, it's even easier (and more efficient because it saves space and avoids the extra indirection) to declare your structure as:

struct foo {
    size_t num_foo;
    type data[];
};

and allocate space for the amount of data you need. The [] syntax is only valid in C99, so for C89-compatibility you should use [1] instead, but this may waste a few bytes.

R.. 2010-07-25 14:52:31

As you know but is probably worth pointing out, the flexible array member at the end of the structure only works when you have one variable length item. The question has two variable length items (`data` and `comment`) and cannot readily use the technique.

Jonathan Leffler 2010-07-25 16:00:10

In that case, store the `comment` as a normal `malloc()`-allocated string.

Donal Fellows 2010-07-25 16:11:14

FWIW, I gave this answer before the comment field was added. With comment added, follow Donal's advice.

R.. 2010-07-25 17:25:02

Answer 2

+1 A:

That is an old game, though the usual form is like

struct foo {
   size_t size
   char data[1]
}

and then allocate the space as big as you want and use array as if it had the desired size.

It is valid, but I would encourage you to find another way if possible: there are lots of chance to screw this up.

dmckee 2010-07-25 14:52:51

This old hack is not *valid*(strictly) according to standard.

Nyan 2010-07-25 14:53:58

No, it is not, but it *will* work with any conforming implementation of `malloc`. ::thinks a little about pathologically segmented memory architectures:: *Almost* any conforming `malloc`.

dmckee 2010-07-25 14:56:58

It is valid as consequences of other requirements of the standard, for example the requirement that if one struct matches the initial part of another, access to those elements through the 2 is compatible. C99 makes it explicitly supported with the flexible array notation `[]` but the old `[1]` method **will** necessarily work too!

R.. 2010-07-25 14:58:26

@dmckee, anything that pathological can't be conformant. The standard requires that the memory returned by `malloc` be accessible as an array of `unsigned char` and that it be suitably aligned for any type.

R.. 2010-07-25 15:02:39

@R: The committee actually discussed that at one point, and decided it really wasn't defined in C89. Accessing beyond the defined size of an array causes undefined behavior. Though it's rare, the compiler *could* do range checking and throw some sort of OS exception (or whatever) when you attempt to access memory after `whatever[0]`.

Jerry Coffin 2010-07-25 16:09:16

@Jerry: It wouldn't be the compiler, but rather the runtime. (The compiler mostly can't tell, since it isn't and shouldn't be that smart.) This sort of thing is of use to purify and efence though, which are effectively non-standard runtimes. You just don't want that normally (since the performance hit is very painful).

Donal Fellows 2010-07-25 16:13:45

@Donal: the compiler generates the code, which executes at run time. Yes, the performance hit is bad enough that in practice nobody does it, but the committee decided that it was allowed anyway.

Jerry Coffin 2010-07-25 16:27:14

@Jerry: But in some cases, the runtime modifies the compiled code when it is loaded to add in the support for bounds checking. (Really. That's exactly how memory access debuggers work.) For my trick, let's discuss the nature of the infinity of angels permitted to dance on the head of a pin, and what ballet school they went to… ;-)

Donal Fellows 2010-07-25 16:45:26

R.. 2010-07-25 17:24:07

@R: it's syntactically valid, but still produces undefined behavior. I wouldn't be too quick about deciding that you're reading the document more accurately than the committee.

Jerry Coffin 2010-07-25 23:00:02

Answer 3

+1 A:

Yes, the general idea of the hack is valid, but at least as I read it, you haven't implemented it quite correctly. This much you've done right:

    f = malloc(num + sizeof(foo) + MAX_COMMENT_SIZE );
    f->data = f + 1;            // is this OK?

But this is wrong:

    f->comment = f + 1 + num;

Since f is foo *, the f+1+num is computed in terms of sizeof(foo) -- i.e., it's equivalent to saying f[1+num] -- it (attempts to) index to the 1+num^th foo in an array. I'm pretty sure that's not what you want. When you allocate the data, you're passing sizeof(foo)+num+MAX_COMMENT_SIZE, so what you're allocating space for is num chars, and what you (presumably) want is to point f->comment to a spot in memory that's num chars after f->data, which would be more like this:

f->comment = (char *)f + sizeof(foo) + num;

Casting f to a char * forces the math to be done in terms of chars instead of foos.

OTOH, since you're always allocating MAX_COMMENT_SIZE for comment, I'd probably simplify things (quite) a bit, and use something like this:

typedef struct foo {
   char comment[MAX_COMMENT_SIZE];
   size_t num_foo;
   char data[1];
}foo;

And then allocate it like:

foo *f = malloc(sizeof(foo) + num-1);
f->num_foo = num;

and it'll work without any pointer manipulation at all. If you have a C99 compiler, you can modify this slightly:

typedef struct foo {
   char comment[MAX_COMMENT_SIZE];
   size_t num_foo;
   char data[];
}foo;

and allocate:

foo *f = malloc(sizeof(foo) + num);
f->num_foo = num;

This has the additional advantage that the standard actually blesses it, though in this case the advantage is pretty minor (I believe the version with data[1] will work with every C89/90 compiler in existence).

Jerry Coffin 2010-07-25 15:53:34

You spotted this at about the same time I did - we composed our answers in parallel.

Jonathan Leffler 2010-07-25 16:01:23

@Jonathan Leffler: Yup -- especially with as verbose an answer as mine, that's almost inevitable...

Jerry Coffin 2010-07-25 16:05:29

Answer 4

+3 A:

The line you question is valid - as others have said.

Interestingly, the next line, which you did not query, is syntactically valid but is not giving you the answer you want (except in the case where num == 0).

typedef struct foo
{
   void *data;
   char *comment;
   size_t num_foo;
} foo;

foo *new_Foo(size_t num, blah blah)
{
    foo *f;
    f = malloc(num + sizeof(foo) + MAX_COMMENT_SIZE );
    f->data = f + 1;            // This is OK
    f->comment = f + 1 + num;   // This is !!BAD!!
    f->num_foo = num;
    ...
    return f;
}

The value of f + 1 is a foo * (implicitly coerced into a void * by the assignment).

The value of f + 1 + num is also a foo *; it points to the num+1^th foo.

What you probably had in mind was:

foo->comment = (char *)f->data + num;

Or:

foo->comment = (char *)(f + 1) + num;

Note that while GCC will allow you to add num to a void pointer, and it will treat it as if sizeof(void) == 1, the C Standard does not give you that permission.

Jonathan Leffler 2010-07-25 15:56:40

Answer 5

A:

Another possible problem might be alignment.

If you simply malloc your f->data, then you can safely e.g. convert your void* to double* and use it to read/write a double (provided that num is sufficiently large). However, in your example you can no longer do that, as f->data might not be properly aligned. For example, to store a double in f->data, you will need to use something like memcpy instead of a simple typecast.

Jukka Suomela 2010-07-25 16:47:02

Using my approach (array at the end) solves the alignment problem automatically. It can also be solved by allocating enough extra space that you can round up `f->data` to the next multiple of `sizeof(double)` or whatever type you need.

R.. 2010-07-25 20:08:54

Answer 6

A:

I'd rather use some function to allocate the data dynamically and free it correctly instead.

Using this trick only saves you the trouble of initializing the data structure, and can lead to very bad problems (see Jerry's comment).

I'd do something like this:

typedef struct foo {
       void *data;
       char *comment;
       size_t num_foo;
    }foo;
foo *alloc_foo( void * data, size_t data_size, const char *comment)
{
   foo *elem = calloc(1,sizeof(foo));
   void *elem_data = calloc(data_size, sizeof(char));
   char *elem_comment = calloc(strlen(comment)+1, sizeof(char));
   elem->data = elem_data;
   elem->comment = elem_comment;

   memcpy(elem_data, data, data_size);
   memcpy(elem_comment, comment, strlen(comment)+1);
   elem->num_foo = data_size + strlen(comment) + 1;
}

void free_foo(foo *f)
{
   if(f->data)
      free(f->data);
   if(f->comment)
      free(f->comment);
   free(f);
}

Note that I did no check on data validity, and my alloc can be optimized (replacing strlen() calls by a stored lenght value).

It seems to me that this behavior is more secure... at the price of a disseminated data chunk maybe.

Gui13 2010-07-27 10:13:30

ansaurus

tags:

views:

answers:

Is this hack valid according to standard?

related questions