views:

92

answers:

4

I wrote the following function to split the given full path into directory, filename and extension.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

struct path_info {
    char *directory;
    char *filename;
    char *extension;
};

#ifdef WIN32
const char directory_separator[] = "\\";
#else
const char directory_separator[] = "/";
#endif

struct path_info* splitpath(const char *full_path)
{
    size_t length = strlen(full_path);
    struct path_info *p = (struct path_info*) malloc(sizeof(struct path_info) + length + 3);  /* Extra space for padding and shifting */
    if(p)
    {
        char *path = (char *) &p[1];    /* copy of the path */
        char *end = &path[length + 1];  
        char *extension;
        char *last_separator;

        /* copy the path */
        strcpy(path, full_path);
        *end = '\0';
        p->directory = end;
        p->extension = end;
        p->filename  = path;

        last_separator = strrchr(path, directory_separator[0]);    /* Finding the last directory separator */
        if(last_separator) {
            memmove(last_separator + 1, last_separator, strlen(last_separator));  /* inserting a directory separator where null terminator will be inserted */
            p->directory = path;
            *(++last_separator) = '\0';      /* Truncate the directory path */
            p->filename = ++last_separator;  /* Taking the remaining as file name */
        }

        /* Finding the extension starts from second character. This allows handling filenames 
           starts with '.' like '.emacs'.*/
        extension = strrchr(&p->filename[1], '.');
        if(extension) {

            /* shifting the bytes to preserve the extension */
            memmove(extension + 1, extension, strlen(extension));   /* problem happens here */
            p->extension = extension + 1;

            *extension = '\0';  /* Truncates the file name */
        }
    }
    return p;
}


int main(void)
{
    struct path_info *p = splitpath("C:\\my documents\\some.txt");
    printf("Directory : %s\n", p->directory);
    printf("Filename : %s\n", p->filename);
    printf("Extension : %s\n", p->extension);
    return 0;
}

This works well for the given input on GCC. But it fails on MSVC leaving some garbage data on extension variable. I have added a comment on the place where things go wrong. I am not understanding why memmove is behaving differently on MSVC? I have used memmove in two places and strange part is that the first one works fine.

Any help would be appreciated.

+6  A: 
dreamlax
+1. could also zero the end of the buffer any time earlier
sje397
It worked. But I am more interested to know why this worked fine on GCC?
Appu
When code has undefined behavior, it's bad to ask "why this worked". Accept that it could have done `system("format /y c:");` and count your blessings.
R..
@Appu: When a string is not properly terminated, different behaviour can be expected from different implementations. GCC generated different machine code that worked just because at that particular point in execution there was another null character there, but this must not have been the case with MSVC.
dreamlax
@Appu, probably you are just lucky that GCC moved it to some zeroed out RAM
gnibbler
The same fix should be included for the `memmove()` of `last_separator`.
caf
+1  A: 

You aren't including null character while memmove(). Try this:

        memmove(last_separator + 1, last_separator, strlen(last_separator)+1); 
        memmove(extension + 1, extension, strlen(extension)+1);*/

EDIT: And here is a bit better way of the same thing that you are doing. This doesn't involve memmoves. But ofcourse you would need separate memory allocation(I am using strdup()). Nulls are also taken care of in the same memory space which was allocated by strdup().

    struct path_info* splitpath1(const char *full_path)
{
    char * path = strdup(full_path);
    char * fileWithExt = strtok((char *)strrchr(path,'\\'),"\\");

    struct path_info *p = (struct path_info*) malloc(sizeof(struct path_info));  /* Extra space for padding and shifting */
    p->filename = strtok(fileWithExt, ".");
    p->extension = strtok(NULL, ".");

    strtok((char *)strchr(path,'\\'),"\\");
    p->directory = path;

    return p;
}
bits
+2  A: 

Your second memmove writes over the terminating '\0' byte. You could move strlen(extension)+1 bytes to solve that problem. I suspect that on GCC you got lucky and there happened to be an additional '\0' byte in the next memory location.

Daniel Stutzbach
+2  A: 

Pretty sure this has nothing to do with memmove but rather the rest of your string logic, which is a mess and very inefficient. Instead of copying at the beginning, why not just identify the 3 parts of your string and their corresponding lengths, then copy them into the destination buffer at the right offsets?

Or if you just need to use the results with printf, don't even make a copy! Just identify the lengths and do something like this:

printf("Directory: %.*s\n", dir_len, full_pathname);
printf("Filename: %.s*\n", name_len, full_pathname+name_start);
printf("Extension: %.*s\n", ext_len, full_pathname+ext_start);

The same works if you're using snprintf to format the text for display in UI elements...

R..
Care to give a reason why you voted -1?
R..
It definitely has to do with memmove. Those memmove statements aren't considering the null character. Which means it should use strlen()+1.
bits
I understand. I will try to refactor this. Thank you very much.
Appu
I don't claim that you couldn't fix the code while still using memmove like that, just that it's a really ugly approach which is difficult to read, difficult to debug, several times slower than it should be, and using dynamic memory allocation unnecessarily to store duplicate information.
R..
And it is not me who down voted.
Appu
If you do factor your code this way, you can always use the start/length information to construct the structure you were building before. `memcpy(ext, full_pathname+ext_start, ext_len); ext[ext_len]=0;` etc. Or you can use `snprintf` to make the copies to avoid a separate null-termination step.
R..
I looked again at the code. I think I am doing what you are saying. All I am doing is finding the required positions and inserting a `\0` there. Then return pointers pointing to the start of each part. To do this, I am making a copy of original string. I am not sure how I can factor this again. Can you help with some code samples?
Appu
In my answer here, I'm offering an approach where you never modify the original string at all. You just compute the starting offset and length of each component. The `%.*s` format specifier to `printf` takes care of ensuring that at most the specified number of bytes are printed.
R..
Ok. My ultimate intention is not just printing. I will be doing some operations with the result.
Appu
Well you can still factor the code this way: first compute offsets/lengths in the original string, then copy the parts you need (using memcpy and manual null-termination or snprintf with the format strings I suggested) to a new buffer. By the way, computing the offsets/lengths becomes just a couple lines of code if you use a regular expression, but if you don't already have a regex library linked in your might not want to bring in the heavyweights. :-)
R..