tags:

views:

176

answers:

4

I'd like to scan a variables that form vectors from white space delimited text file and the stumbling block (all to often for me) is lack of elegance.

Currently my scanning code requires delineating the size of the vector as the first element in the file:

7 : 1 3 6 8 -9 .123 1.1

Which bothers me because the '7' could be determined by inspecting the white space.

I've tried various forms of fscanf(), strtok() etc., but all seem to be brute forcish. Without resorting to lex/yacc (not available) could someone suggest something more elegant than the following?

typedef struct vector_tag
{
    int Length;
    double * value;
} vector;

vector v;

char buf[BIG_ENOUGH], key[BIG_ENOUGH], val[BIG_ENOUGH];

void scan_vector(FILE * fh)
{
    int i, length;
    double * data;
    char * tok;

    do {
        if (feof(fh)) return;
        fgets(buf, sizeof buf, fh);    
    } while (2 != sscanf(buf,"%[^:]:%[^\n\r]",key,val));

    length      =
    v.Length    = strtol(key,NULL,10);
    data        =
    v.value     = malloc(length * sizeof(double));

    tok = strtok(val, " "); /* I'd prefer tokenizing on whitespace */
    for (i = 0; i++ < v.Length; ) {
        * data++ = strtod(tok,NULL);;
        tok = strtok(NULL, " "); /* Again, tokenize on whitespace */
    }
}

Solution: Thanks to the checked answer, I implemented:

static int scan_vector(FILE * fh, vector * v)
{
    if (1 == fscanf(fh,"%d:",& v->length))
    {
        int         i;

        v->value    = malloc(v->Length * sizeof(double));

        assert (NULL != v->value);

        for (i = 0; i < v->Length; i++)
        {
            if (fscanf(fh,"%lf",v->value + i) != 1) return(0);
        } 
        return(1);
    } 
    return(0);
} /* scan_vector() */
A: 

If you use realloc() you can always ask for more memory if you don't allocate enough with the initial malloc(). A common strategy is to allocate an arbitrary n items to start. Whenever you run out then of space you double n and resize the buffer.

Alternatively, you could use a linked list instead of an array. Linked lists handle insertions and appends better than arrays, but you give up the ability to access items by index.

John Kugelman
It's more in the scanning of the file than memory management that I'm after; although I would of course need a temporary space for scanning if I didn't know apriori how many variables I'd need to scan.
Jamie
OK, I'll stop being coy. I was hinting (oops) that you could read the numbers in one at a time and realloc() as needed. That lets you read in each line in one pass without needing the length marker.
John Kugelman
A: 

How big can your vectors be?
One way to go is,

  • scan a line into local buffer (this is one vector data I presume)
  • scan over that local buffer to count the white space delimiters (quite easy to code)
  • then make the correct allocation
  • and, initialize the vector

Like you observe, the dimension '7' need not be part of the input.
You just need one local buffer large enough for the longest line possible.
And, some error handling for it :-)

nik
+1  A: 

what's wrong with something like:

int scan_vector(FILE *fh)
{
    char pad[2];
    int i;
    if (fscanf(fh,"%d %1[:]", &v.Length, &pad) != 2)
        return -1;
    v.value = malloc(v.Length * sizeof(double));
    for (i = 0; i < v.Length; i++) {
        if (fscanf(fh, "%lf", &v.value[i]) != 1)
            return -1;
    }
    return 0;
}

This attempts to read the vector with scanf, and returns a -1 error code if there was a problem.

If you want to do something much more complex than this, you're probably better off using flex at least (if not bison as well).

Chris Dodd
Easy marks. It was right before me, thanks for penetrating my question.
Jamie
(Aside from the 'scanf' -> 'fscanf' typo) +1
Jamie
A: 

Here's a version which doesn't need the vector's size as first entry in the file:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define LINE_MAX 256
#define VECTOR_SIZE_MAX 32

struct vector
{
    size_t size;
    double *values;
};

// returns 1 on error
_Bool scan_vector(FILE *file, struct vector *v)
{
    char buffer[LINE_MAX];
    if(!fgets(buffer, sizeof(buffer), file))
        return 1;

    double values[VECTOR_SIZE_MAX];

    size_t size = 0;
    errno = 0;

    for(char *head = buffer, *tail = NULL;; ++size, head = tail)
    {
        while(isspace(*head)) ++head;
        if(!*head) break;

        if(size >= VECTOR_SIZE_MAX)
            return 1;

        values[size] = strtod(head, &tail);
        if(errno || head == tail)
            return 1;
    }

    v->size = size;
    v->values = malloc(sizeof(double) * size);
    if(!v->values) return 1;

    memcpy(v->values, values, sizeof(double) * size);

    return 0;
}

int main(void)
{
    struct vector v;
    while(!scan_vector(stdin, &v))
    {
        printf("value count: %u\n", (unsigned)v.size);
        free(v.values);
    }

    return 0;
}

The maximum line size and number of entries are fixed out of performance reasons and laziness.

Christoph