views:

225

answers:

4

I've recently (4 days) started to learn C++ coming from C / Java background. In order to learn a new language I ussualy start by re-implementing different classical algorithms, as language specific as I can.

I've come to this code, its a DFS - Depth First Search in an unoriented graph. Still from what I read it's best to pass parameters by references in C++. Unfortunately I can't quite grasp the concept of reference. Every time I need a reference, I get confused and I think in terms of pointers. In my current code, i use pass by value .

Here is the code (probably isn't Cppthonic as it should):

#include <algorithm>
#include <iostream>
#include <fstream>
#include <string>
#include <stack>
#include <vector>

using namespace std;

template <class T>
void utilShow(T elem);

template <class T>
void utilShow(T elem){
    cout << elem << " ";
}

vector< vector<short> > getMatrixFromFile(string fName);
void showMatrix(vector< vector<short> > mat);
vector<unsigned int> DFS(vector< vector<short> > mat);

/* Reads matrix from file (fName) */
vector< vector<short> > getMatrixFromFile(string fName)
{
    unsigned int mDim;
    ifstream in(fName.c_str());
    in >> mDim;
    vector< vector<short> > mat(mDim, vector<short>(mDim));
    for(int i = 0; i < mDim; ++i) {
        for(int j = 0; j < mDim; ++j) {
            in >> mat[i][j];
        }
    }
    return mat;
}

/* Output matrix to stdout */
void showMatrix(vector< vector<short> > mat){
    vector< vector<short> >::iterator row;
    for(row = mat.begin(); row < mat.end(); ++row){
        for_each((*row).begin(), (*row).end(), utilShow<short>);
        cout << endl;
    }
}

/* DFS */
vector<unsigned int> DFS(vector< vector<short> > mat){
    // Gives the order for DFS when visiting
    stack<unsigned int> nodeStack;
    // Tracks the visited nodes
    vector<bool> visited(mat.size(), false);
    vector<unsigned int> result;
    nodeStack.push(0);
    visited[0] = true;
    while(!nodeStack.empty()) {
        unsigned int cIdx = nodeStack.top();
        nodeStack.pop();
        result.push_back(cIdx);
        for(int i = 0; i < mat.size(); ++i) {
            if(1 == mat[cIdx][i] && !visited[i]) {
                nodeStack.push(i);
                visited[i] = true;
            }
        }
    }
    return result;
}

int main()
{
    vector< vector<short> > mat;
    mat = getMatrixFromFile("Ex04.in");
    vector<unsigned int> dfsResult = DFS(mat);

    cout << "Adjancency Matrix: " << endl;
    showMatrix(mat);

    cout << endl << "DFS: " << endl;
    for_each(dfsResult.begin(), dfsResult.end(), utilShow<unsigned int>);

    return (0);
}

Can you please can give me some hints on how to use references, by referencing to this code ?

Is my current programming style, compatible with the constructs of C++ ?

Is there a standard alternative for vector and type** for bi dimensional arrays in C++ ?

LATER EDIT:

OK, I've analyzed your answers (thanks all), and I've rewritten the code in a more OOP manner. Also I've understand what a reference and were to use it. It's somewhat similar to a const pointer, except the fact that a pointer of that type can hold a NULL.

This is my latest code:

#include <algorithm>
#include <fstream>
#include <iostream>
#include <ostream>
#include <stack>
#include <string>
#include <vector>

using namespace std;

template <class T> void showUtil(T elem);

/**
* Wrapper around a graph
**/
template <class T>
class SGraph
{
private:
    size_t nodes;
    vector<T> pmatrix;
public:
    SGraph(): nodes(0), pmatrix(0) { }
    SGraph(size_t nodes): nodes(nodes), pmatrix(nodes * nodes) { }
    // Initialize graph from file name
    SGraph(string &file_name);
    void resize(size_t new_size);
    void print();
    void DFS(vector<size_t> &results, size_t start_node);
    // Used to retrieve indexes.
    T & operator()(size_t row, size_t col) {
        return pmatrix[row * nodes + col];
    }
};

template <class T>
SGraph<T>::SGraph(string &file_name)
{
    ifstream in(file_name.c_str());
    in >> nodes;
    pmatrix = vector<T>(nodes * nodes);
    for(int i = 0; i < nodes; ++i) {
        for(int j = 0; j < nodes; ++j) {
            in >> pmatrix[i*nodes+j];
        }
    }
}

template <class T>
void SGraph<T>::resize(size_t new_size)
{
    this->pmatrix.resize(new_size * new_size);
}

template <class T>
void SGraph<T>::print()
{
    for(int i = 0; i < nodes; ++i){
        cout << pmatrix[i];
        if(i % nodes == 0){
            cout << endl;
        }
    }
}

template <class T>
void SGraph<T>::DFS(vector<size_t> &results, size_t start_node)
{
    stack<size_t> nodeStack;
    vector<bool> visited(nodes * nodes, 0);
    nodeStack.push(start_node);
    visited[start_node] = true;
    while(!nodeStack.empty()){
        size_t cIdx = nodeStack.top();
        nodeStack.pop();
        results.push_back(cIdx);
        for(int i = 0; i < nodes; ++i){
            if(pmatrix[nodes*cIdx + i] && !visited[i]){
                nodeStack.push(i);
                visited[i] = 1;
            }
        }
    }
}

template <class T>
void showUtil(T elem){
    cout << elem << " ";
}

int main(int argc, char *argv[])
{
    string file_name = "Ex04.in";
    vector<size_t> dfs_results;

    SGraph<short> g(file_name);
    g.DFS(dfs_results, 0);

    for_each(dfs_results.begin(), dfs_results.end(), showUtil<size_t>);

    return (0);
}
+1  A: 
void utilShow(T& elem);
vector< vector<short> > getMatrixFromFile(const string& fName);
void showMatrix(vector< vector<short> >& mat);
vector<unsigned int> DFS(vector< vector<short> >& mat);

Some which I could figure out. And if possible if you aren't changing or intend to change the state of the object inside your method body make the variables passed as const.

I wouldn't ask you include all the C++ constructs in your first try itself, but gradually so that you don't overwhelm yourself to depression. Vector is the most used STL container. And usage of containers depend on your needs rather than feeling fanciful to use one over another.

One brief description of containers. http://msdn.microsoft.com/en-us/library/1fe2x6kt%28VS.80%29.aspx

@Jerry Thanks for editing. Vector isn't overused, but is used more because of its simplicity for simple objects, rather than large monolithic class objects. It resembles a C style array, but isn't, with a lot of extra algorithms. Two more which are used quite frequently are maps and lists. It maybe so because of the places where I work they need the use of these containers more than at other places.

DumbCoder
Is vector overused ? (PS: Please use the code tag)
Andrei Ciobanu
+2  A: 

To pass by reference, you'd typically change this:

vector<unsigned int> DFS(vector< vector<short> > mat){

to:

vector<unsigned int> DFS(vector<vector<short>> const &mat) { 

Technically, this is passing a const reference, but that's what you normally want to use when/if you're not planning to modify the original object.

On another note, I'd probably change this:

for_each((*row).begin(), (*row).end(), utilShow<short>);

to something like:

std::copy(row->begin(), row->end(), std::ostream_iterator<short>(std::cout, " "));

Likewise:

for_each(dfsResult.begin(), dfsResult.end(), utilShow<unsigned int>);

would become:

std::copy(dfsResult.begin(), dfsResult.end(),
          std::ostream_iterator<unsigned int>(std::cout, " "));

(...which looks like it would obviate utilShow entirely).

As far as 2D matrices go, unless you need a ragged matrix (where different rows can be different lengths), you typically use a simple front-end to handle indexing in a single vector:

template <class T>
class matrix { 
    std::vector<T> data_;
    size_t columns_;
public:
    matrix(size_t rows, size_t columns) : columns_(columns), data_(rows * columns)  {}

    T &operator()(size_t row, size_t column) { return data[row * columns_ + column]; }
};

Note that this uses operator() for indexing, so instead of m[x][y], you'd use m(x,y), about like in BASIC or Fortran. You can overload operator[] in a way that allows you to use that notation if you prefer, but it's a fair amount of extra work with (IMO) little real benefit.

Jerry Coffin
Thanks for the idea with the matrix template. It will surely simplify a lot the syntax.
Andrei Ciobanu
+7  A: 

For 4 days into C++, you're doing a great job. You're already using standard containers, algorithms, and writing your own function templates. The most sorely lacking thing I see is exactly in reference to your question: the need to pass by reference/const reference.

Any time you pass/return a C++ object by value, you are invoking a deep copy of its contents. This isn't cheap at all, especially for something like your matrix class.

First let's look at showMatrix. The purpose of this function is to output the contents of a matrix. Does it need a copy? No. Does it need to change anything in the matrix? No, it's purpose is just to display it. Thus we want to pass the Matrix by const reference.

typedef vector<short> Row;
typedef vector<Row> SquareMatrix;
void showMatrix(const SquareMatrix& mat);

[Note: I used some typedefs to make this easier to read and write. I recommend it when you have a lot of template parametrization].

Now let's look at getMatrixFromFile:

SquareMatrix getMatrixFromFile(string fName);

Returning SquareMatrix by value here could be expensive (depending on whether your compiler applies return value optimization to this case), and so is passing in a string by value. With C++0x, we have rvalue references to make it so we don't have to return a copy (I also modified the string to be passed in by const reference for same reasons as showMatrix, we don't need a copy of the file name):

SquareMatrix&& getMatrixFromFile(const string& fName);

However, if you don't have a compiler with these features, then a common compromise is to pass in a matrix by reference and let the function fill it in:

void getMatrixFromFile(const string& fName, SquareMatrix& out_matrix);

This doesn't give provide as convenient a syntax for the client (now they have to write two lines of code instead of one), but it avoids the deep copying overhead consistently. There is also MOJO to address this, but that will become obsolete with C++0x.

A simple rule of thumb: if you have any user-defined type (not a plain old data type) and you want to pass it to a function:

  1. pass by const reference if the function only needs to read from it.
  2. pass by reference if the function needs to modify the original.
  3. pass by value only if the function needs a copy to modify.

There are exceptions where you might have a cheap UDT (user-defined type) that is cheaper to copy than it is to pass by const reference, e.g., but stick to this rule for now and you'll be on your way to writing safe, efficient C++ code that doesn't waste precious clock cycles on unnecessary copies (a common bane of poorly written C++ programs).

Absolutely solid. +1
Platinum Azure
Any reasonably modern compiler will include both anonymous and named return value optimization (RVO and NRVO) so there's rarely a reason to worry about returning large values (i.e., the compiler will eliminate the extra copying).
Jerry Coffin
@Jerry I might sound as though I'm bordering on the side of premature optimization, but I've found too many hotspots in profiler sessions of our system (using MSVC 2008 and GCC 4.1) caused by return by value (particularly tiny strings in commonly accessed functions) to really dependably trust RVO. It seems I'm not the only one, however, since otherwise there would have been no need for C++0x rvalue references if all modern compilers applied RVO reliably.
Nevertheless, I'm editing the post to mention RVO for the getMatrixFromFile case. Thanks for pointing it out.
SquareMatrix* getMatrixFromFile(const string
Andrei Ciobanu
@Andrei avoid such solutions at all cost. That would imply that you allocate memory in getMatrixFromFile and then pass on the responsibility to the client calling that function to manually free the memory. If you're interested in doing things like that, @see boost::shared_ptr or std::shared_ptr (if you are using a compiler with TR1/C++0x features).
+1  A: 

References and pointers are closely related. Both are ways of passing parameters without copying the parameter value onto the subroutine's stack frame.

The main difference between them:

  • A pointer p points to an object o.
  • A reference i is an object o. In other words, in an alias.

To make things more confusing, as far as I know, the compiler implementation between the two is pretty much the same.

Imagine the function Ptr(const T* t) and Ref(const T& t).

int main() { int a; Ptr(&a); Ref(a); }

In Ptr, t is going to point to the location of a. You can dereference it and get the value of a. If you do &t (take the address of t), you will get the address of the parameter.

In Ref, t is a. You can use a for the value of a. You can get the address of a with &a. It's a little syntactic sugar that c++ gives you.

Both provide a mechanism for passing parameters without copying. In your function (by the way, you don't need the declaration):

template <class T> void utilShow(T elem) { ... }

Every time it gets called, T will be copied. If T is a large vector, it is copying all the data in the vector. That's pretty inefficient. You don't want to pass the entire vector to the new stack frame, you want to say "hey - new stack frame, use this data". So you can pass by reference. What does that look like?

template <class T> void utilShow(const T &elem) { ... }

elem is const, because it's not changed by the function. It's also going to use the memory for elem that's stored in the caller, rather than copying it down the stack.

Again, for the same reason (to avoid a copy of the parameters), use:

vector< vector<short> > getMatrixFromFile(const string &fName) { ... }
void showMatrix(const vector< vector<short> > &mat) { ... }

The one tricky part is that you might think: "Hey, a reference means no copies! I'm gonna use it all the time! I'm gonna return references from functions!" And that's where your program crashes.

Imagine this:

// Don't do this!
Foo& BrokenReturnRef() {
  Foo f;
  return f;
}

int main() {
  Foo &f = BrokenReturnRef();
  cout << f.bar();
}

Unfortunately, this is broken! When BrokenReturnRef runs, f is in scope and everything is cool. Then you return to main and keep referencing f. The stack frame that created f has gone away, and that location is no longer valid, and you're referencing junk memory. In this case, you'll have to return by value (or allocate a new pointer on the heap).

The one exception to the rule of "don't return references" is when you know that memory will outlast the stack. This is how STL implements operator[] for its containers.

Hope that helps! :)

Stephen