ansaurus

Question

Using C/C++ to efficiently de-serialize a string comprised of floats, tokens and blank lines

Answer 1

+1 A:

This is a bit crude and untested, but the general idea is to try parsing each line and see what's there:

while (!feof (stdin))
{
    char buf [100];
    (!fgets (buf, sizeof buf, stdin))
        break;  // end of file or error

    // skip leading whitespace
    char *cp = buf;
    while (isspace (*cp))
         ++cp;

    if (*cp == '\000')  // blank line?
    {
        do_whatever_for_a_blank_line ();
        continue;
    }

    // try reading a float
    double v1, v2;
    char *ep = NULL;
    v1 = strtod (cp, &ep);
    if (ep == cp)   // if nothing parsed
    {
        do_whatever_for_a_text_token (cp);
        continue;
    }

    while (isspace (*cp))
       ++cp;
    ep = NULL;
    v2 = strtod (cp, &ep);
    if (ep == cp)   // if no float parsed
    {
         handle_single_floating_value (v1);
         continue;
    }
    handle_two_floats (v1, v2);  
 }

wallyk 2010-01-14 05:23:52

comments: `fgets` return `char *`, not `int`. Most of the times, `while(!feof(fp)) { ... }` is wrong in C: http://c-faq.com/stdio/feof.html.

Alok 2010-01-14 05:28:42

Looking at your code more, with `fgets()` return value fixed, you don't have the error mentioned in my link above. Still, I would move the `fgets()` itself in the condition part of `while`. (I can't edit my last comment, hence a new one.)

Alok 2010-01-14 05:37:53

Quite right. I've fixed `fgets()` accordingly. While `feof()` has problems, combining it with `fgets` as show above works very well.

wallyk 2010-01-14 06:19:49

Answer 2

+5 A:

Using C, I would do something like this (untested):

#include <stdio.h>

#define MAX 128

char buf[MAX];
while (fgets(buf, sizeof buf, fp) != NULL) {
    double d1, d2;
    if (buf[0] == '\n') {
        /* saw blank line */
    } else if (sscanf(buf, "%lf%lf", &d1, &d2) != 2) {
        /* buf has the next text token, including '\n' */
    } else {
        /* use the two doubles, d1, and d2 */
    }
}

The check for blank line is first because it's relatively inexpensive. Depending upon your needs:

you might need to increase/change MAX,
you may need to check if buf ends with a newline, if it doesn't, then the line was too long (go to 1 or 3 in that case),
you might need a function that reads full lines from a file, using malloc() and realloc() to dynamically allocate the buffer (see this for more),
you might want to take care of special cases such as a single floating-point value on a line (which I assume is not going to happen). sscanf() returns the number of input items successfully matched and assigned.

I am also assuming that blank lines are really blank (just the newline character by itself). If not, you will need to skip leading white-space. isspace() in ctype.h is useful in that case.

fp is a valid FILE * object returned by fopen().

Alok 2010-01-14 05:24:00

5. You might want to detect malformed input (e.g. "1.0 1.0foo"). (If you want to use `sscanf` instead of `strtod`, one could use `"%lf%lf%c"` as the format string and verify either that no character was obtained or that it's a newline.)

jamesdlin 2010-01-14 07:37:08

Answer 3

+4 A:

Wow, I don't write many parsers in C any more

This has been tested on the OP's input

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef enum {
  scan_blank, scan_label, scan_float
} tokens;

double f1, f2;

char line[512], string_token[sizeof line];

tokens scan(void) {
  char *s;
  for(s = line; *s; ++s) {
    switch(*s) {
      case ' ':
      case '\t':
        continue;
      case '\n':
        return scan_blank;
      case '0': case '1': case '2': case '3': case '4':
      case '5': case '6': case '7': case '8': case '9':
      case '.': case '-':
        sscanf(line, " %lf %lf", &f1, &f2);
        return scan_float;
      default:
        sscanf(line, " %s", string_token);
        return scan_label;
    }
    abort();
  }
  abort();
}

int main(void) {
  int n;
  for(n = 1;; ++n) {
    if (fgets(line, sizeof line, stdin) == NULL)
      return 0;
    printf("%2d %-40.*s", n, (int)strlen(line)-1, line);
    switch(scan()) {
      case scan_blank:
        printf("blank\n");
        break;
      case scan_label:
        printf("label [%s]\n", string_token);
        break;
      case scan_float:
        printf("floats [%lf %lf]\n", f1, f2);
        break;
    }
  }
}

DigitalRoss 2010-01-14 05:43:33

ansaurus

tags:

views:

answers:

Using C/C++ to efficiently de-serialize a string comprised of floats, tokens and blank lines

Wow, I don't write many parsers in C any more

related questions