views:

2335

answers:

7

I have a file with data listed as follows:

0,       2,    10
10,       8,    10
10,       10,   10
10,       16,   10
15,       10,   16
17,       10,   16

I want to be able to input the file and split it into three arrays, in the process trimming all excess spaces and converting each element to integers.

For some reason I can't find an easy way to do this in c++. The only success I've had is by inputting each line into an array, and then regexing out all the spaces and then splitting it up. This entire process took me a good 20-30 lines of code and its a pain to modify for say another separator(eg. space), etc.

This is the python equivalent of what I would like to have in C++:

f = open('input_hard.dat')
lines =  f.readlines()
f.close()

#declarations
inint, inbase, outbase = [], [], []

#input parsing
for line in lines:
    bits = string.split(line, ',')
    inint.append(int(bits[0].strip()))
    inbase.append(int(bits[1].strip()))
    outbase.append(int(bits[2].strip()))

The ease of use of doing this in python is one of the reasons why I moved to it in the first place. However, I require to do this in C++ now and I would hate to have to use my ugly 20-30 line code.

Any help would be appreciated, thanks!

+2  A: 

Something like:

vector<int> inint;
vector<int> inbase;
vector<int> outbase;
while (fgets(buf, fh)) {
   char *tok = strtok(buf, ", ");
   inint.push_back(atoi(tok));
   tok = strtok(NULL, ", ");
   inbase.push_back(atoi(tok));
   tok = strtok(NULL, ", ");
   outbase.push_back(atoi(tok));
}

Except with error checking.

MattSmith
I would avoid such a "C-ish" solution for, well, aesthetics...but more importantly in this case because strtok has some serious thread-safe issues. Correct code though!
MattyT
+1  A: 

std::getline allows you to read a line of text, and you can use a string stream to parse the individual line:

string buf;
getline(cin, buf); 
stringstream par(buf);

char buf2[512];
par.getline(buf2, 512, ','); /* Reads until the first token. */

Once you get the line of text into the string, you can actually use any parsing function you want, even sscanf(buf.c_str(), "%d,%d'%d", &i1, &i2, &i3), by using atoi on the substring with the integer, or through some other method.

You can also ignore unwanted characters in the input stream, if you know they're there:

if (cin.peek() == ',')
    cin.ignore(1, ',');
cin >> nextInt;
Raymond Martineau
+1  A: 

If you don't mind using the Boost libraries...

#include <string>
#include <vector>
#include <boost/lexical_cast.hpp>
#include <boost/regex.hpp>

std::vector<int> ParseFile(std::istream& in) {
    const boost::regex cItemPattern(" *([0-9]+),?");
    std::vector<int> return_value;

    std::string line;
    while (std::getline(in, line)) {
        string::const_iterator b=line.begin(), e=line.end();
        boost::smatch match;
        while (b!=e && boost::regex_search(b, e, match, cItemPattern)) {
            return_value.push_back(boost::lexical_cast<int>(match[1].str()));
            b=match[0].second;
        };
    };

    return return_value;
}

That pulls the lines from the stream, then uses the Boost::RegEx library (with a capture group) to extract each number from the lines. It automatically ignores anything that isn't a valid number, though that can be changed if you wish.

It's still about twenty lines with the #includes, but you can use it to extract essentially anything from the file's lines. This is a trivial example, I'm using pretty much identical code to extract tags and optional values from a database field, the only major difference is the regular expression.

EDIT: Oops, you wanted three separate vectors. Try this slight modification instead:

const boost::regex cItemPattern(" *([0-9]+), *([0-9]+), *([0-9]+)");
std::vector<int> vector1, vector2, vector3;

std::string line;
while (std::getline(in, line)) {
    string::const_iterator b=line.begin(), e=line.end();
    boost::smatch match;
    while (b!=e && boost::regex_search(b, e, match, cItemPattern)) {
        vector1.push_back(boost::lexical_cast<int>(match[1].str()));
        vector2.push_back(boost::lexical_cast<int>(match[2].str()));
        vector3.push_back(boost::lexical_cast<int>(match[3].str()));
        b=match[0].second;
    };
};
Head Geek
+2  A: 

There's no real need to use boost in this example as streams will do the trick nicely:

int main(int argc, char* argv[])
{
    ifstream file(argv[1]);

    const unsigned maxIgnore = 10;
    const int delim = ',';
    int x,y,z;

    vector<int> vecx, vecy, vecz;

    while (file)
    {
        file >> x;
        file.ignore(maxIgnore, delim);
        file >> y;
        file.ignore(maxIgnore, delim);
        file >> z;

        vecx.push_back(x);
        vecy.push_back(y);
        vecz.push_back(z);
    }
}

Though if I were going to use boost I'd prefer the simplicity of tokenizer to regex... :)

MattyT
+1  A: 

There is really nothing wrong with fscanf, which is probably the fastest solution in this case. And it's as short and readable as the python code:

FILE *fp = fopen("file.dat", "r");
int x, y, z;
std::vector<int> vx, vy, vz;

while (fscanf(fp, "%d, %d, %d", &x, &y, &z) == 3) {
  vx.push_back(x);
  vy.push_back(y);
  vz.push_back(z);
}
fclose(fp);
ididak
A: 

why not the same code as in python :) ?

std::ifstream file("input_hard.dat");
std::vector<int> inint, inbase, outbase;

while (file.good()){
 int val1, val2, val3;
 char delim;
 file >> val1 >> delim >> val2 >> delim >> val3;

 inint.push_back(val1);
 inbase.push_back(val2);
 outbase.push_back(val3);
}
da_m_n
A: 

If you want to be able to scale to harder input formats, you should consider spirit, boost parser combinator library.

This page has an example which almost do what you need (with reals and one vector though)

David Pierre