ansaurus

Question

How to read in a data file of unknown dimensions in C/C++

Answer 1

+8 A:

How about:

Load the file.
Count the number of rows and columns.
Close the file.
Allocate the memory needed.
Load the file again.
Fill the array with data.

Every .obj (3D model file) loader I've seen uses this method. :)

knight666 2010-02-08 11:04:22

You can use `fseek`/`fstream::seekg` to reset the cursor to 0 without re-opening the file.

KennyTM 2010-02-08 12:04:56

Answer 2

A:

Do you need a square or a ragged matrix? If the latter, create a structure like this:

 std:vector < std::vector <double> > data;

Now read each line at a time into a:

 vector <double> d;

and add the vector to the ragged matrix:

 data.push_back( d );

All data structures involved are dynamic, and will grow as required.

anon 2010-02-08 11:15:57

Answer 3

+10 A:

Create table as vector of vectors:

std::vector<std::vector<double> > table;

Inside infinite (while(true)) loop:

Read line:

std::string line;
std::getline(ifs, line);

If something went wrong (probably EOF), exit the loop:

if(!ifs) 
    break;

Skip that line if it's a comment:

if(line[0] == '#')
    continue;

Read row contents into vector:

std::vector<double> row;
std::copy(std::istream_iterator<double>(ifs),
          std::istream_iterator<double>(),
          std::back_inserter(row));

Add row to table;

table.push_back(row);

At the time you're out of the loop, "table" contains the data:

table.size() is the number of rows
table[i] is row i
table[i].size() is the number of cols. in row i
table[i][j] is the element at the j-th col. of row i

Manuel 2010-02-08 11:17:01

upvoted this as it helped the most

Simon Walker 2010-02-08 12:35:35

Answer 4

A:

Figured out a way to do this. Thanks go mostly to Manuel as it was the most informative answer.

std::vector< std::vector<double> > readIn2dData(const char* filename)
{
    /* Function takes a char* filename argument and returns a 
     * 2d dynamic array containing the data
     */

    std::vector< std::vector<double> > table; 
    std::fstream ifs;

    /*  open file  */
    ifs.open(filename);

    while (true)
    {
        std::string line;
        double buf;
        getline(ifs, line);

        std::stringstream ss(line, std::ios_base::out|std::ios_base::in|std::ios_base::binary);

        if (!ifs)
            // mainly catch EOF
            break;

        if (line[0] == '#' || line.empty())
            // catch empty lines or comment lines
            continue;


        std::vector<double> row;

        while (ss >> buf)
            row.push_back(buf);


        table.push_back(row);


    }

    ifs.close();

    return table;
}

Basically create a vector of vectors. The only difficulty was splitting by whitespace which is taken care of with the stringstream object. This may not be the most effective way of doing it but it certainly works in the short term!

Also I'm looking for a replacement for the deprecated atof function, but nevermind. Just needs some memory leak checking (it shouldn't have any since most of the objects are std objects) and I'm done.

Thanks for all your help

Simon Walker 2010-02-08 12:40:00

Why use atof? What is wrong with `ifstream is(file); float f; is >> f;`

graham.reeds 2010-02-08 13:10:58

cheers just changed it, much cleaner

Simon Walker 2010-02-08 14:17:28

Answer 5

A:

I've seen your answer, and while it's not bad, I don't think it's ideal either. At least as I understand your original question, the first comment basically specifies how many columns you'll have in each of the remaining rows. e.g. the one you've given ("1 4 6 28") contains four numbers, which can be interpreted as saying each succeeding line will contain 4 numbers.

Assuming that's correct, I'd use that data to optimize reading the data. In particular, after that, (again, as I understand it) the file just contains row after row of numbers. That being the case, I'd put all the numbers together into a single vector, and use the number of columns from the header to index into the rest:

class matrix { 
    std::vector<double> data;
    int columns;
public:
    // a matrix is 2D, with fixed number of columns, and arbitrary number of rows.
    matrix(int cols) : columns(cols) {}

    // just read raw data from stream into vector:    
    std::istream &read(std::istream &stream) { 
        std::copy(std::istream_iterator<double>(stream), 
                  std::istream_iterator<double>(), 
                  std::back_inserter(data));
        return stream;
   }

   // Do 2D addressing by converting rows/columns to a linear address
   // If you want to check subscripts, use vector.at(x) instead of vector[x].
   double operator()(size_t row, size_t col) { 
       return data[row*columns+col];
   }
};

This is all pretty straightfoward -- the matrix knows how many columns it has, so you can do x,y indexing into the matrix, even though it stores all its data in a single vector. Reading the data from the stream just means copying that data from the stream into the vector. To deal with the header, and simplify creating a matrix from the data in a stream, we can use a simple function like this:

matrix read_data(std::string name) { 
    // read one line from the stream.
    std::ifstream in(name.c_str());
    std::string line;
    std::getline(in, line);

    // break that up into space-separated groups:
    std::istringstream temp(line);
    std::vector<std::string> counter;
    std::copy(std::istream_iterator<std::string>(temp), 
              std::istream_iterator<std::string>(),
              std::back_inserter(counter));

    // the number of columns is the number of groups, -1 for the leading '#'.
    matrix m(counter.size()-1);

    // Read the remaining data into the matrix.
    m.read(in);
    return m;
}

As it's written right now, this depends on your compiler implementing the "Named Return Value Optimization" (NRVO). Without that, the compiler will copy the entire matrix (probably a couple of times) when it's returned from the function. With the optimization, the compiler pre-allocates space for a matrix, and has read_data() generate the matrix in place.

Jerry Coffin 2010-02-08 15:43:34

had to change a couple of things to get this to work:return data[row*cols+col]; -> return data[row*columns+col];std::getline(line, in); -> std::getline(in, line);It's good, but I feel I understand my answer better

Simon Walker 2010-02-10 11:26:19

@Simon:Quite true -- the code wasn't tested, so a couple of bugs isn't a big surprise. Thanks for pointing them out -- I'll fix those in the code.

Jerry Coffin 2010-02-10 14:27:18

ansaurus

tags:

views:

answers:

How to read in a data file of unknown dimensions in C/C++

related questions