views:

427

answers:

5

Possible Duplicate:
Parsing text in C

Say I have written to a text file in this format:

key1/value1
key2/value2
akey/withavalue
anotherkey/withanothervalue

I have a linked list like:

struct Node
{
    char *key;
    char *value;
    struct Node *next;
};

to hold the values. How would I read key1 and value1? I was thinking of putting line by line in a buffer and using strtok(buffer, '/'). Would that work? What other ways could work, maybe a bit faster or less prone to error? Please include a code sample if you can!

+1  A: 

You could also use fscanf to parse the input lines directly into keys and values:

char key[80], value[80];
fscanf (pFile, "%s/%s", key, value);

However, the drawback of this approach is that you need to allocate big enough buffers for the keys and values in advance (or use a temp buffer, then copy its value into its final destination, allocated with the right size). With strtok you can check the length of each key and value, then allocate a buffer of exactly the right size to store it.

Update: As commenters pointed out, another (potentially more serious) drawback of fscanf is that it doesn't work for strings containing whitespace.

Péter Török
How would you achieve that if key/value are generic strings?
Let_Me_Be
@Let_Me_Be, what do you mean by "generic string"?
Péter Török
@Péter: perhaps strings which could contain spaces. scanf's %s format matches a sequence of non-whitespace characters.
Juliano
@Juliano, hmmm... good point, I haven't considered that.
Péter Török
It should be fine since the input will not have any spaces either. I will be sure to allocate the struct appropriately. So the above code should do it?
Mohit Deshpande
I haven't tested it myself, so I offer no guarantee :-) but FWIW it should work with your strings.
Péter Török
A: 

If you wouldn't mind having aboth the key and the walue in one memory block (two strings), you can just read into a buffer, find the '/' change it into '\0' and point the value pointer just behind the '\0' character. Just don't forget to call free() only on the key value (this will free both the key and the value).

Let_Me_Be
I need the key and the value in different strings. But good idea of using the null-terminator '\0'
Mohit Deshpande
A: 

cycle through lines:
1) find '/'
2) devide pair key/value at the position '/'
3) fill key and value
in code:

char* p;
if(p = strchr(str,'/')
{
  *p++ = 0;
  key = strdup(str);
  value = strdup(p);
}
oraz
A: 

if you say hold the value 27878 / 989797 like that

char *p; fscanf("%s/%s",p,(p+1));

gcc
Are you sure this is an answer to the question above?
Péter Török
+3  A: 

Since your problem is a very good candidate for optimizing memory fragmentation, here is an implementation that uses some simple arcane magic to allocate all strings and the structure itself in a single piece of memory.

When destroying the node, you need only a single call to free(), to the node itself.

struct Node *list = NULL, **nextp = &list;
char buffer[1024];

while (fgets(buffer, sizeof buffer, file) != NULL) {
    struct Node *node;

    node = malloc(sizeof(struct Node) + strlen(buffer) + 1);
    node->key = strtok(strcpy((char*)(node+1), buffer), "/\r\n");
    node->value = strtok(NULL, "\r\n");
    node->next = NULL;
    *nextp = node;
    nextp = &node->next;
}

Explanation:

With 20 comments and one unexplained downvote, I think that the code needs some explanation, specially with regards to the tricks employed:

  1. Building a linked list:

    struct Node *list = NULL, **nextp = &list;
    ...
    *nextp = node;
    nextp = &node->next;
    

    This is a trick to create a linked list iteratively in forward order without having to special-case the head of the list. It uses a pointer-to-pointer to the next node. First the nextp pointer points to the list head pointer; in the first iteration, the list head is set through this pointer-to-pointer and then nextp is moved to the next pointer of that node. Subsequent iterations fill the next pointer of the last node.

  2. Single allocation:

    node = malloc(sizeof(struct Node) + strlen(buffer) + 1);
    node->key = ... strcpy((char*)(node+1), buffer) ...
    

    We have to deal with three pointers: the node itself, the key string and the value string. This usually would require three separate allocations (malloc, calloc, strdup...), and consequently free separate releases (free). Instead, in this case, the spaces of the tree elements are summed in sizeof(struct Node) + strlen(buffer) + 1 and passed to a single malloc call, which returns a single block of memory. The beginning of this block of memory is assigned to node, the structure itself. The additional memory (strlen(buffer)+1) comes right after the node, and it's address is obtained using pointer arithmetic using node+1. It is used to make a copy of the entire string read from the file ("key/value\n").

    Since malloc is called a single time for each node, a single allocation is made. It means that you don't need to call free(node->key) and free(node->value). In fact, it won't work at all. Just a single free(node) will take care of deallocating the structure and both strings in one block.

  3. Line parsing:

    node->key = strtok(strcpy((char*)(node+1), buffer), "/\r\n");
    node->value = strtok(NULL, "\r\n");
    

    The first call to strtok returns the pointer to the beginning of the buffer itself. It looks for a '/' (additionally for end-of-line markers) and breaks the string there with a NUL character. So the "key/value\n" is broken in "key" and "value\n" with a NUL character in between, and a pointer to the first is returned and stored in node->key. The second call to strtok will work upon the remaining "value\n", strip the end-of-line marker and returning a pointer to "value", which is stored in node->value.

I hope this cleans all questions about the above solution... it is too much for a closed question. The complete test code is here.

Juliano
This is very good I think. One bug though: If the final line of the file does not have '\n', the token will not be found and strtok returns NULL. Hence, the program will fail to add the final legit value part.
drewk
`strtok` will parse the final value (the right side of the `/`) part just with two calls. No need for the second `\n` token. Just call strtok with the same `/` token twice with NULL ptr the second call. Bug fixed...
drewk
Also: shouldn't `malloc` be `calloc` if you are trying to extend the memory? You have no guarantee of contiguous memory between calls to `malloc`...
drewk
@drewk: It should work even if the last line doesn't have '\n', it will just consider the rest of the file until EOF as the last token. `strtok` will only return NULL if a '/' wasn't found in the previous call. The OP doesn't say anything that the second part doesn't have any '/'. strtok'ing a '\n' has the benefit that it erases the newline for us.
Juliano
@drewk: The difference between `calloc()` and `malloc()` is that the former cleans the memory with zeros. A single call to any of these functions will return a single continuous block of memory, which is what I expect. I do not care for continuous memory between different calls. There is no standard C function that makes this guarantee.
Juliano
@Juliano: I agree and see the benefit -- equivalent to a Perl `chomp`. However, I still contend that if the last line of the file has "key"/"value" but no "\n" at the end, the strtok will return NULL and fail to add the final value to the nodes and you will have a NULL ptr. Have you tested it? It works fine if there is a "\n" on the final line but there is the other case of no terminating "\n"
drewk
@Juliano: I meant `realloc` not `calloc` sorry for the typo. I was referring to the single call to `free()` Needs to be contiguous memory, no? Otherwise the ptrs within the struct are not freed.
drewk
@Juliano: You could just add `strtok([first ptr, then NULL], "/\n")` to the two calls to `strtok()`, no?
drewk
@drewk Yes, I tested it. I'm pretty sure because I use strtok A LOT. Also, the behavior is documented in the C standard and the glibc manpage. Do the test yourself and tell me your results.
Juliano
@drewk: to the first call, maybe... to the second, it depends if the user expects '/' to appear in the second field. In this solution, I'm making no assumptions that '/' won't appear in the second field. So if the user has a line like "a/b/c\n", it will parse as key="a" value="b/c".
Juliano
@drewk: ref: realloc, in this case, I want each node to be one allocation, different nodes are different allocations. The trick is to put one node and its strings in a single allocation.
Juliano
You should probably add that `struct Node *next;` must go *before* `char* key` in the struct due to `(node+1)` in the above code (not as in the OP's question).
J.F. Sebastian
@J.F.: I don't think so. `(node+1)` is used to get a pointer to the end of the space reserved to the Node structure, where it will store the strings. The order of next and key doesn't matter.
Juliano
Yay! A downvote? Care to explain why?
Juliano
Will the above code work???
Mohit Deshpande
@Mohit: It sure does... try it!
Juliano
OK -- I tested it, and it does indeed work whether or not there is a `\n` after the last line. The reason is not that the second call to `strtok(NULL,"\n")` is using the the `\n` as a delimiter; a call to `strtok(ptr,"/")` initially sets the delimiter and subsequent calls delimitate on those characters but does not fail if not found. It returns the next field. I am not convinced that all memory is released by a single call to free() however...
drewk
This does not work for me, all I get is key/value key/key key/value Literally those strings. Not the actual keys and values
Mohit Deshpande
@Mohit: here, it compiles cleanly, runs and returns proper results: http://pastebin.com/cDg6L6zt
Juliano
@drewk, there is only one malloc() per node, so you need one free() per node. Check it, no surprise here. The normal way would be to allocate the node and the two strings separately, and that would mean that `free(node->key); free(node->value); free(node);` would be required to properly release the node. The magic is that the code allocates a single block to the node and its two strings. Perhaps you thought that it was a single allocation for the whole list... it is not.
Juliano
@Mohit, @drewk: I made a big expansion to the answer, explaining the tricks in more detail. Check and see if everything is clear now.
Juliano
@Juliano: My initial comment was erroneous. I've confused your code with a common C idiom `struct {... char buf[1]; }` where the variable length item can appear only as the last item of the structure.
J.F. Sebastian