tags:

views:

1107

answers:

3

Hi

I'm parsing some CSV data in C for the purposes of a Ruby extension. In order to pull out the data from each row I'm using sscanf as follows:

  char* line = RSTRING_PTR(arg);
  double price;
  double volume_remaining;
  unsigned int type_id, range, order_id, volume_entered, minimum_volume, duration, station_id, region_id, solar_system_id, jumps;
  char* issued;
  char* bid;
  printf("I got %s\n",line);
  int res = sscanf(line, "%lf,%lf,%u,%u,%u,%u,%u,%s,%s,%u,%u,%u,%u,%u", &price, &volume_remaining, &type_id, &range, &order_id, &volume_entered, &minimum_volume, bid, issued, &duration, &station_id, &region_id, &solar_system_id, &jumps);
  printf("I matched %d values\n", res);
  printf("I have price %f, vol_rem %f, type_id %d, range %d, order_id %d, vol_ent %d, min_vol %d, issued %s, bid %s, duration %d, station_id %d, region_id %d, solar_system_id %d, jumps %d, source %s \n",price, volume_remaining, type_id, range, order_id, volume_entered, minimum_volume, issued, bid, duration, station_id, region_id, solar_system_id, jumps, source); // and hash build follows below

Running it produces this:

I got 728499.93,437.0,2032,32767,1132932560,588,1,False,2009-05-24 19:52:08.000,90,60003760,10000002,30000142,0
I matched 7 values
I have price 728499.930000, vol_rem 437.000000, type_id 2032, range 32767, order_id 1132932560, vol_ent 588, min_vol 1, issued (null), bid (null), duration -1210229476, station_id 3001, region_id 3001, solar_system_id 1, jumps -1210299816

Note the null strings. Basically, it seems like sscanf is tripping on these for some reason. I can't figure out why even having read the docs thoroughly. Any ideas?

+2  A: 

Your character pointers are unitialized, and point to a random segment of memory. You must allocate a buffer for sscanf() to write to, and it must be big enough. (You're lucky that didn't segfault.) That second part is the hard part -- scanf() might not be the right tool for the job here.

Thanatos
Ah, OK, that makes sense. The strings are of a prescribed, fixed length. What is the best way to go about allocating space for the pointers?
James Harrison
Changed my definition to char issued[30],bid[10]; and the matcher to %[^,] instead of %s. Not perfect, but does the job and works.
James Harrison
Yet again a classic scanf problem. IMHO scanf is the devil's function (along with realloc...) and should NOT be used unless there really is no alternative.
AAT
scanf() can be useful, when used properly. I do not understand your comments on realloc().
Thanatos
+1  A: 

%s matches non-whitespace characters. What you probably want is %[^,]255 which will match every character other than , instead of %s. The 255, which is optional, specifies the field width that you're expecting for that field.

Peter Kovacs
Good advice, but not immediately the source of the trouble.
Jonathan Leffler
Yes, although it seems like the sscanf would barf on the date, and wouldn't match anything beyond that since its expecting a `,` instead of a space.
Peter Kovacs
A: 

I agree with Thanatos. As a first start you need to allocate memory for issued and bid, you might do:

char issued[1024]; char bid[1024];

Alex Black