tags:

views:

170

answers:

4

I need parse through a file and do some processing into it. The file is a text file and the data is a variable length data of the form "PP1004181350D001002003..........". So there will be timestamps if there is PP so 1004181350 is 2010-04-18 13:50. The ones where there are D are the data points that are three separate data each three digits long, so D001002003 has three coordonates of 001, 002 and 003.

Now I need to parse this data from a file for which I need to store each timestamp into a array and the corresponding datas into arrays that has as many rows as the number of data and three rows for each co-ordinate. The end array might be like

TimeStamp[1] = "135000", low[1] = "001", medium[1] = "002", high[1] = "003"
TimeStamp[2] = "135015", low[2] = "010", medium[2] = "012", high[2] = "013"
TimeStamp[3] = "135030", low[3] = "051", medium[3] = "052", high[3] = "043"
....

The question is how do I go about doing this in C? How do I go through this string looking for these patterns and storing the values in the corresponding arrays for further processing?

Note: Here the seconds value in timestamp is added on our own as it is known at each data comes after 15 seconds.

A: 

Simply Parsing? Here it is!!


UPDATE: Checkout KillianDS's code above. Thats even better!!

  • [STEP 1] Search for \n ( or CR+LF)

  • [STEP 2] Starting from the first character on the line, U know the no. of characters each datafield occupies. Read that many characters from the file.

Repeat for all fields.

CVS-2600Hertz
that should be `\n`, not `/n`
Hasturkun
And he even nowhere stated there are newlines present.
KillianDS
@Hasturkun Thnx. My typo. Corrected it!!
CVS-2600Hertz
@CVS26: This isn't a place to advertise your blog. Please don't include links in an answer unless they directly answer the question. You're free to promote your site on your user profile page.
Bill the Lizard
Thank U for kindly-pointing it out to me. Sorry for any inconvenience i hav caused. I have ceased linking henceforth. Just my signature. I hope thats ok.Thank You.
CVS-2600Hertz
A: 

As long as your patterns aren't variable length, you could simply use fscanf. If you need something more complex, you might try PCRE, but for this case I think sscanf will suffice.

splicer
+2  A: 

edit: updated to follow your specs.

While your file seems to be variable length, your data isn't, you could use fscanf and do something like this:

while(fscanf(file,"PP%*6d%4d", &timestamp, &low, &medium, &high)) 
{
    for(int i = 0; fscanf(file, "D%3d%3d%3d", &low, &medium, &high); i++)
    {
        timestamp=timestamp*100+i*15;
        //Do something with variables (e.g. convert to string, push into vector, ...)
    }
}

Note that this reads the data into integers (timestamp, low, medium and high are int's), A string version looks like this (timestamp, low, medium and high are char arrays):

int first[] = {'0', '1', '3', '4'};
int second[] = {'0','5'};

while(fscanf(file,"PP%*6d%4c", &timestamp, &low, &medium, &high)) 
{
    for(int i = 0; fscanf(file, "D%3c%3c%3c", &low, &medium, &high); i++)
    {
        timestamp[i][4]=first[i%4];
        timestamp[i][2]=second[i%2];
    }
}

edit: some more explanation about the formatting string, with %*6d I mean: look for 6 digits and discard them (* means: do not put in a variable). %4d or %4c means in this context the same (as 1 digit will be one char), but we do save them in corresponding variables.

KillianDS
how do you suggest the timestamp, low, medium and high arrays be declared and handled?
sfactor
well, in the first code they're ints, no further declaration necessary, in the second code timestamp is a char[7], the others char[4]. The one extra char is for '\0', which you should not forget to declare (e.g. timestamp[6] = '\0'). Handling depends on what you want and have available. In C++ I'd combine them in a map or vector.
KillianDS
thanx a lot for this, there is one more thing i would like to ask. the timestamp is in timeformat, so how do I consider that? Can I have it as a "time value"...simply having it as integer won't work.
sfactor
A: 

I wouldn't recommend using fscanf directly on input data because it is very sensitive to the in data, if one byte is wrong and suddenly doesn't the format specifier then you could in worst case a memory overwrite.

It is better to either in using fgetc and parse as it comes in or read into a buffer (fread) and process it from there.

Anders K.