ansaurus

Question

Answer 1

A:

If I were you I would start again from scratch. I would:

use std::strings instead of character arrays for your data
reads line at a time from the file using std::getline
parse the line up using a stringstream
avoid mixing formatted and unformatted input

anon 2009-05-06 20:02:47

The reason this is tricky is that names sometimes have 3, 4 or even 5 "words". Street names and city's are similarly variable.

dicroce 2009-05-06 20:05:46

I assumed this was a learning exercise where the questioner controlled the input data. If not, he should obviously investigate formats such as XML or CSV for input.

anon 2009-05-06 20:09:08

Using fixed size fields formats is the easiest way to do this. If you use a field separator character then you need to worry about escaping the field separator within a field or alternatively encoding sizes into the format or go with a heavy weight defined format like XML.

Martin York 2009-05-06 21:21:35

Answer 2

A:

My approach to this would be the following:

1) Read each line into a null terminated buffer. 2) Use a split() function that you're gonna have to write. This function should take a string as its input and return a list. It should also take a separator. The separator in this case is ' '. 3) iterate over the list carefully (are there never middle names?) What about 1 word, or 3 word street names? Since many of these columns are really variable in number of words, and you have no seperator other than whitspace, this may prove a fairly tough task. If you NEVER have middle names, you could assume the first two columns are first and last name. You know for sure what the last two are. Everything between them could be assigned to a single address field.

dicroce 2009-05-06 20:03:43

Answer 3

+2 A:

The problem is istream::get() breaks for streetAddress which has spaces in it.

One way is to tokenize the input line first into say, a vector of strings and then depending on the number of tokens convert these to appropriate fields of your CustomerType:

vector<string> tokenize(string& line, char delim=' ') {
      vector<string> tokens;
      size_t spos = 0, epos = string::npos;
      while ((epos = line.find_first_of(delim)) != string::npos) {
          tokens.push_back(line.substr(spos, epos - spos));
          spos = epos; 
      }
      return tokens;     
}

I'd rather a stream extraction operator for CustomerType :

struct CustomerType  {
   friend istream& operator>>(istream& i, CustomerType& c);
   string firstName, lastName, ...;
   // ...
};

istream& operator>>(istream& i, CustomerType& c) {       
   i >> c.firstName >> c.lastName;
   string s1, s2, s3;
   i >> s1 >> s2 >> s3;
   c.streetAddress = s1 + s2 + s3;  
   i >> c.city >> c.state >> c.zipCode;
   return i;
}

dirkgently 2009-05-06 20:03:54

I agree that's generally niftier, but do we want to present the poor guy with operator overoading right off? :-)

Charlie Martin 2009-05-06 20:10:42

This is what I'd suggest. Only issue I can think of is that there may be a variable number of items in the address field, but if so then a different character would be needed to split up fields (tab or | would be my ideas) and then you could just use get() with a different separator :)

workmad3 2009-05-06 20:10:58

I was thinking that get got until count, end of line, or delimiter (which is not specified), so it would not break on whitespace, whereas the >> operator always breaks on ws.

crashmstr 2009-05-06 20:11:54

And now the problem is that he will probably get buffer overruns.

anon 2009-05-06 20:12:36

@Charlie Martin: There's another answer floating where someone introduced another (apparent) newbie to Boost::Tokenizer. What'd you know of my troubles to keep my fingers off of typing that ;-)

dirkgently 2009-05-06 20:13:31

I bet. Gotta read up on Boost.

Charlie Martin 2009-05-06 22:26:51

Answer 4

+1 A:

You're getting 8 characters for State, which includes all your zipcode, and is larger than your field.

It'd also be tempting to use the skipws operator:

infile >> skipws >> CT_Struct.firstName
       >> CT_Struct.lastName 
       >> ... ;

(Update: that's what I get for doing that from memory. This is more closely approximating correct.)

Charlie Martin 2009-05-06 20:04:34

ansaurus

tags:

views:

answers:

c++, using get and >> for ifstream

related questions