tags:

views:

70

answers:

5

How to safety parse tab-delimiter string ? for example: test\tbla-bla-bla\t2332 ?

A: 

You can use any regex library or even the GLib GScanner, see here and here for more information.

Tarantula
+1  A: 

Using strok() from string.h

#include <stdio.h>
#include <string.h>

int main ()
{
    char str[] = "test\tbla-bla-bla\t2332";
    char * pch;
    pch = strtok (str," \t");
    while (pch != NULL)
    {
        printf ("%s\n",pch);
        pch = strtok (NULL, " \t");
    }
    return 0;
}
Ruel
+1  A: 

strtok() is a standard function for parsing strings with arbitrary delimiters. It is, however, not thread-safe. Your C library of choice might have a thread-safe variant.

Another standard-compliant way (just wrote this up, it is not tested):

#include <string.h>
#include <stdio.h>

int main()
{
    char string[] = "foo\tbar\tbaz";
    char * start = string;
    char * end;
    while ( ( end = strchr( start, '\t' ) ) != NULL )
    {
        // %s prints a number of characters, * takes number from stack
        // (your token is not zero-terminated!)
        printf( "%.*s\n", end - start, start );
        start = end + 1;
    }
    // start points to last token, zero-terminated
    printf( "%s", start );
    return 0;
}
DevSolar
the format specifier should read `%.*s`
Christoph
@Christoph: Correct, of course. Confused %c with `scanf()`, where you need it to read in spaces, and you're also right about the width / precision mixup. Thanks for pointing it out.
DevSolar
+2  A: 

Use strtok_r instead of strtok (if it is available). It has similar usage, except it is reentrant, and it does not modify the string like strtok does. [Edit: Actually, I misspoke. As Christoph points out, strtok_r does replace the delimiters by '\0'. So, you should operate on a copy of the string if you want to preserve the original string. But it is preferable to strtok because it is reentrant and thread safe]

strtok will leave your original string modified. It replaces the delimiter with '\0'. And if your string happens to be a constant, stored in a read only memory (some compilers will do that), you may actually get a access violation.

Ziffusion
afaik `strtok_r()` works like `strtok()` - ie it will modify the string, replacing the separator with zeros! the difference between the functions is that `strtok_r()` doesn't use an internal `static` variable, but a user-supplied one to store its state
Christoph
You are correct! I missed that. So, you would need to operate on a copy of the string. But strtok_r is still preferable because it is reentrant.
Ziffusion
A: 

Yet another version; this one separates the logic into a new function

#include <stdio.h>

static _Bool next_token(const char **start, const char **end)
{
    if(!*end) *end = *start;    // first call
    else if(!**end)             // check for terminating zero
        return 0;
    else *start = ++*end;       // skip tab

    // advance to terminating zero or next tab
    while(**end && **end != '\t')
        ++*end;

    return 1;
}

int main(void)
{
    const char *string = "foo\tbar\tbaz";

    const char *start = string;
    const char *end = NULL; // NULL value indicates first call

    while(next_token(&start, &end))
    {
        // print substring [start,end[
        printf("%.*s\n", end - start, start);
    }

    return 0;
}
Christoph