views:

95

answers:

5

Hi,

I'm having a bit of a problem with strcat and segmentation faults. The error is as follows:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x00007fff82049f1f in __strcat_chk ()
(gdb) where
#0  0x00007fff82049f1f in __strcat_chk ()
#1  0x0000000100000adf in bloom_operation (bloom=0x100100080, item=0x100000e11 "hello world", operation=1) at bloom_filter.c:81
#2  0x0000000100000c0e in bloom_insert (bloom=0x100100080, to_insert=0x100000e11 "hello world") at bloom_filter.c:99
#3  0x0000000100000ce5 in main () at test.c:6

bloom_operation is as follows:

int bloom_operation(bloom_filter_t *bloom, const char *item, int operation)
{
    int i;    

    for(i = 0; i < bloom->number_of_hash_salts; i++)
    {
        char temp[sizeof(item) + sizeof(bloom->hash_salts[i]) + 2];
        strcat(temp, item);
        strcat(temp, *bloom->hash_salts[i]);

        switch(operation)
        {
            case BLOOM_INSERT:
                bloom->data[hash(temp) % bloom->buckets] = 1;
                break;
            case BLOOM_EXISTS:
                if(!bloom->data[hash(temp) % bloom->buckets]) return 0;
                break;
        }    
    }

    return 1;
}

The line with trouble is the second strcat. The bloom->hash_salts are part of a struct defined as follows:

typedef unsigned const char *hash_function_salt[33];
typedef struct {
    size_t buckets;    
    size_t number_of_hash_salts;
    int bytes_per_bucket;
    unsigned char *data;
    hash_function_salt *hash_salts;
} bloom_filter_t;

And they are initialized here:

bloom_filter_t* bloom_filter_create(size_t buckets, size_t number_of_hash_salts, ...) 
{
    bloom_filter_t *bloom;
    va_list args;
    int i;

    bloom = malloc(sizeof(bloom_filter_t));
    if(bloom == NULL) return NULL;

    // left out stuff here for brevity...

    bloom->hash_salts = calloc(bloom->number_of_hash_salts, sizeof(hash_function_salt));

    va_start(args, number_of_hash_salts);

    for(i = 0; i < number_of_hash_salts; ++i)
        bloom->hash_salts[i] = va_arg(args, hash_function_salt);

    va_end(args);

    // and here...
}

And bloom_filter_create is called as follows:

bloom_filter_create(100, 4, "3301cd0e145c34280951594b05a7f899", "0e7b1b108b3290906660cbcd0a3b3880", "8ad8664f1bb5d88711fd53471839d041", "7af95d27363c1b3bc8c4ccc5fcd20f32");

I'm doing something wrong but I'm really lost as to what. Thanks in advance,

Ben.

+4  A: 

You need to use strlen, not sizeof. item is passed in as a pointer, not an array.

The line:

char temp[sizeof(item) + sizeof(bloom->hash_salts[i]) + 2];

will make temp the 34x the length of a pointer + 2. The size of item is the size of a pointer, and the sizeof(bloom->hash_salts[i]) is currently 33x the size of a pointer.

You need to use strlen for item, so you know the actual number of characters.

Second, bloom->hash_salts[i] is a hash_function_salt, which is an array of 33 pointers to char. It seems like hash_function_salt should be defined as:

since you want it to hold 33 characters, not 33 pointers. You should also remember that when you're passing a string literal to bloom_filter_create, you're passing a pointer. That means to initialize the hash_function_salt array we use memcpy or strcpy. memcpy is faster when we know the exact length (like here):

So we get:

typedef unsigned char hash_function_salt[33];

and in bloom_filter_create:

memcpy(bloom->hash_salts[i], va_arg(args, char*), sizeof(bloom->hash_salts[i]));

Going back to bloom_operation, we get:

char temp[strlen(item) + sizeof(bloom->hash_salts[i])];
strcpy(temp, item);
strcat(temp, bloom->hash_salts[i]);

We use strlen for item since it's a pointer, but sizeof for the hash_function_salt, which is a fixed size array of char. We don't need to add anything, because hash_function_salt already includes room for a NUL. We use strcpy first. strcat is for when you already have a NUL-terminated string (which we don't here). Note that we drop the *. That was a mistake following from your incorrect typedef.

Matthew Flaschen
with the memcpy line, is it possible to make the 33 variable (i.e. based off the definition in the header) will something like sizeof(hash_function_salt) work?
benofsky
@benofsky, yes. I'll fix it to use `sizeof(bloom->hash_salts[i])`. This works because `hash_function_salt` is an array, not a pointer.
Matthew Flaschen
I now have `memcpy(bloom->hash_salts[i], va_arg(args, char*), sizeof(bloom->hash_salts[i]));` and I get the error: `warning: passing argument 1 of ‘__builtin___memcpy_chk’ discards qualifiers from pointer target type`?
benofsky
@benofsky, your typedef included a `const`. I removed it, so you can initialize this way.
Matthew Flaschen
Fantastic, that works, thanks so much! Do I need to call free on each item of hash_salts in my bloom_filter_destroy method?
benofsky
@benofsky, no. Since the elements of `hash_salts` are arrays, not pointers, you only have to free `hash_salts` itself.
Matthew Flaschen
Thanks so much for all your help! I'm new to this C thing! :)
benofsky
+9  A: 

I see a couple of problems:

char temp[sizeof(item) + sizeof(bloom->hash_salts[i]) + 2];

The sizeof(item) will only return 4 (or 8 on a 64-bit platform). You probably need to use strlen() for the actual length. Although I don't think you can declare it on the stack like that with strlen (although I think maybe I saw someone indicate that it was possible with newer versions of gcc - I may be out to lunch on that).

The other problem is that the temp array is not initialized. So the first strcat may not write to the beginning of the array. It needs to have a NULL (0) put in the first element before calling strcat.

It may already be in the code that was snipped out, but I didn't see that you initialized the number_of_hash_salts member in the structure.

Mark Wilkins
Yeah, you can declare it on the stack like that with `strlen` and recent gcc (recent = 3.x, iirc). It's a C99 feature. I don't know if MSVC ever picked that up, though.
Zack
if that line is now: char temp[strlen(item) + strlen(*bloom->hash_salts[i]) + 2]; - I get a segfault on the second strlen (strlen(*bloom->hash_salts[i]), any ideas?
benofsky
@Zack, Thanks for that information. I thought I had seen that. It would be a cool feature. I would like to see that in MSVC (most of what I write has to run on both Linux and Win32, so I can't use it now).
Mark Wilkins
@benofsky, Did you initialize temp? `temp[0] = 0;`
Mark Wilkins
variable length arrays are C99. GCC supports them in C90 and C++ mode. The last time I've checked VLA weren't supported in MSVC. A more portable solution would be to use alloca()
Luther Blissett
@mark, I've just added that in but gdb is showing that it fails when the variable is declared not at the first strcat. The sefault is on strlen(*bloom->hash_salts[i]).
benofsky
Yeah, you must initialize it. strcat cats after the first 0-character, so there must be one to start.
Luther Blissett
Yeah, I'm aware but it's failing *before* the first strcat. :)
benofsky
I would use strcpy instead of the first strcat. That way you don't have to waste time initializing temp.
aaaa bbbb
good idea @aaaa bbbb
benofsky
@benofsky - Ah - sorry. I didn't read your previous comment carefully enough. It appears there are problems initializing the hash_salts array as well. I don't think you are are passing the character arrays around quite as expected. I'll try to update my answer a bit more.
Mark Wilkins
@benofsky - I started to add that information, but I think Matthew covered it already. The hash_function_salt typedef probably needs to have the * removed from it.
Mark Wilkins
yes that was it, thanks so much for all your help!
benofsky
+2  A: 

Your array size calculation for temp uses sizeof(bloom->hash_salts[i]) (which is just the size of the pointer), but then you dereference the pointer and try to copy the entire string into temp.

Jim Lewis
Actually, `bloom->hash_salts[i]` is an array of char *, so it has the size of 33 pointers. It is incorrectly defined though. It should be an array of char.
Matthew Flaschen
@Matthew: Wow, nice catch.
Jim Lewis
+1  A: 

Are you sure that the hash_function_salt type is defined correctly? You may have too many *'s:

(gdb) ptype bloom
type = struct {
    size_t buckets;
    size_t number_of_hash_salts;
    int bytes_per_bucket;
    unsigned char *data;
    hash_function_salt *hash_salts;
} *
(gdb) ptype bloom->hash_salts
type = const unsigned char **)[33]
(gdb) ptype bloom->hash_salts[0]
type = const unsigned char *[33]
(gdb) ptype *bloom->hash_salts[0]
type = const unsigned char *
(gdb) 
eruciform
or will that resolve to a string and you meant strlen() on it and item, rather than sizeof() ?
eruciform
+2  A: 

First, as everyone has said, you've sized temp based on the sizes of two pointers, not the lengths of the strings. You've now fixed that, and report that the symptom has moved to the call to strlen().

This is showing a more subtle bug.

You've initialized the array bloom->hash_salts[] from pointers returned by va_arg(). Those pointers will have a limited lifetime. They may not even outlast the call to va_end(), but they almost certainly do not outlive the call to bloom_filter_create(). Later, in bloom_filter_operation(), they point to arbitrary places and you are doomed to some kind of interesting failure.

Edit: Resolving this this requires that the pointers stored in the hash_salts array have sufficient lifetime. One way to deal with that is to allocate storage for them, copying them out of the varargs array, for example:

// fragment from bloom_filter_create()
bloom->hash_salts = calloc(bloom->number_of_hash_salts, sizeof(hash_function_salt));
va_start(args, number_of_hash_salts);
for(i = 0; i < number_of_hash_salts; ++i)
        bloom->hash_salts[i] = strdup(va_arg(args, hash_function_salt));
va_end(args);

Later, you would need to loop over hash_salts and call free() on each element before freeing the array of pointers itself.

Another approach that would require more overhead to initialize, but less effort to free would be to allocate the array of pointers along with enough space for all of the strings in a single allocation. Then copy the strings and fill in the pointers. Its a lot of code to get right for a very small advantage.

RBerteig
I'm pretty sure this is the deeper problem (see my comments above on mark's answer) because using strlen I still get a segfault. Any idea on how to resolve this?
benofsky
I'd use `strdup()` to allocate a copy of each string... see my updated answer.
RBerteig
Of course, if in the larger context you really didn't want an array of pointers to strings, then I'm fixing a moot problem here. Mark and Mathew have answers that better address what you are actually trying to do, rather than the mistakes in what you actually did ;-)
RBerteig
Fixing the typedef problem which Matthew pointed out, made it work for me. If I do not make these changes you've described (or the memcpy ones detailed by matthew), will I run into problems later? (it seems to work :-/)
benofsky
What you have to do is know the lifetime of any data referenced by a pointer. Specifically, the data has to be in memory at least as long as the pointer is assumed valid. The result of va_arg is only pointing to valid memory during the execution of a single function since a function's arguments may disappear from memory when the function returns. "Seems to work" is good enough for a quick and dirty program that will never be run again. Otherwise, you have to know if the pointers are to live data or you *will* have bugs later and they will be *very* difficult to cure.
RBerteig
Ok, that's what I thought :). I'm assuming Matthew's memcpy method also works in fixing this bug?
benofsky
His solution is still likely better than mine for this specific case since you know something from the problem domain about the data you are storing. Both have the property that you are allocating a place to save a copy of the information that came in as arguments to `bloom_filter_create()` so that the salt strings have a storage lifetime long enough to use them later.
RBerteig