views:

26

answers:

2
+1  Q: 

parsing an sstream

Hi,

I am parsing a file which contains both strings and numerical values. I'd like to process the file field by field, each delimited by a space or an end-of-line character. The ifstream::getline() operation only allows a single delimiting character. What I currently do is thus a getline with the character ' ' as a delimiter, and then manually go back to the previous position in the stream if a '\n' has been encountered :

 ifstream ifs ( filename , ifstream::in );
 streampos pos;
 while (ifs.good())
 {  
  char curField[255];  

  pos = ifs.tellg();
  ifs.getline(curField, 255, ' '); 
  string s(curField);
  if (s.find("\n")!=string::npos) 
  {   
   ifs.seekg(pos); 
   ifs.getline(curField, 255, '\n'); 
   s = string(curField);
  }

 // process the field contained in the string s...
 }

However, the "seekg" seems to position the stream one character too late (I thus miss the first character of each field before each line break). I know there are other ways to code such a parser, by scanning line by line etc.., but I'd really like to understand why this particular piece of code fails...

Thank you very much!

A: 

There may be a look-ahead/push-back character in the input stream. IIRC, the seek/tell functions are not aware of this.

Loadmaster
There doesn't seem to be any such characters. Here is an example which fails : it goes back to the character #96 instead of #95 : [87] 48 '0' char [88] 46 '.' char [89] 49 '1' char [90] 50 '2' char [91] 55 '7' char [92] 57 '9' char [93] 52 '4' char [94] 32 ' ' char [95] 48 '0' char [96] 46 '.' char [97] 48 '0' char [98] 48 '0' char [99] 52 '4' char [100] 52 '4' char [101] 55 '7' char [102] 52 '4' char [103] 54 '6' char [104] 55 '7' char [105] 10 '␊' char [106] 32 ' ' char
WhitAngl
A: 

As Loadmaster said, there may be unaccounted for characters, or this could just be an off-by-one error.

But this just has to be said... you can replace this:

 ifstream ifs ( filename , ifstream::in );
 streampos pos;
 while (ifs.good())
 {  
  char curField[255];  

  pos = ifs.tellg();
  ifs.getline(curField, 255, ' '); 
  string s(curField);
  if (s.find("\n")!=string::npos) 
  {   
   ifs.seekg(pos); 
   ifs.getline(curField, 255, '\n'); 
   s = string(curField);
  }

 // process the field contained in the string s...
 }

With this:

 ifstream ifs ( filename , ifstream::in );
 streampos pos;
 string s;
 while (ifs.good())
 {  
   ifs >> s;
   // process the field contained in the string s...
 }

To get the behavior you want.

JoshD
mmm... although there doesn't seem to be such unaccounted characters and it thus doesn't completely answer the question, I'll accept the answer since it is sooo much simpler that my stuff and it works well :p Thanks!!
WhitAngl
Actually I'm still interested in why my code doesn't work since I'm using it elsewhere as well, and in this case I need to detect the line break itself and treat it separately.... [and also because I'm very curious why it doesn't work as it is !]
WhitAngl
@WhitAngl: I suppose you've read the documentation for tellg, right? I'm not sure what the problem is. What do you have to do differently for a \n character?
JoshD
I have the documentation right open, and of course I read it. "The get pointer determines the next location in the input sequence to be read by the next input operation". The first getline(.., ' ') gives the right character sequence (thus possibly ending with \n), and the second getline(...,'\n') starts one character off although the seekg has reset the reading position to the one which worked for the first getline(..., ' '). I thus still don't understand. In my other scenario, at each '\n' I reset a counter... Thanks!
WhitAngl