views:

366

answers:

11

I fully understand this question has been asked a lot, but I'm asking for a specific variation and my search-foo has given up, as I've only found algorithms that append one existing vector to another, but not one returned to from a function.

I have this function that lists all files in a directory:

vector<string> scanDir( const string& dir )

which may call itself internally (for subdirectories).

I need a short way of appending the returned value to the caller's vector. I have in my mind something like this (but of course it doesn't exist :( ):

vector<string> fileList;
//...
fileList.append( scanDir(subdirname) );

I fear that storing the return value and inserting it in fileList would bring performance badness. What I mean is this:

vector<string> temp( scanDir(subdirname) );
copy( temp.begin(), temp.end(), back_inserter(fileList) );

Thanks!

PS: I'm not forcing myself to using vector, any other container that performs equally well and can prevent the potential large copy operation is fine by me.

+2  A: 

Instead of

vector<string> temp( scanDir(subdirname) );

you can do

vector<string> const& temp = scanDir(subdirname);

and proceed with the copy:

fileList.insert(fileList.end(), temp.begin(), temp.end());
Alexandre C.
Unlikely to make any difference, once copy constructor elision is finished with it. There is no requirement for the first line of code to do anything that the second line of code isn't also required to do.
Steve Jessop
+13  A: 

Why not just pass the vector as an argument? Then every invocation can append to the same vector, without copying. Or create an implementation class which accumulates the elements into a member object.

Philipp
+1. Passing as an argument is what I do in the performance sensitive code.
Dummy00001
cape1232
+1 for simplicity.
Brian
The STL way would be to make `scanDir()` a template and pass an output iterator into it. Nevertheless, the idea is good, so `+1` from me.
sbi
A: 

The recursive function will have to copy everything multiple times, O(depth) to be precise (i.e: everything in the leaf level will be copied again and again until it reaches the root).

The best method would be splitting this into two different functions:

vector<string> scanDir(string path)
{
  vector<string> retval;

  scanDir(path, &retval);

  return retval;
}

static void scanDir(string path, vector<string>& output)
{
  .. scan
  .. append to output 
}
sukru
in your first overload, a complete copy is of retval is guaranteed to be made when returning.
Alexandre C.
@alexandre: how would I avoid that when using this function to initialize a class data member? Wouldn't there be some compiler magic avoiding the copy?
rubenvb
@rubenvb: return value optimization takes place when you return with a constructor. Use a class instead, see MartinStettner's comment.
Alexandre C.
I always thought the return value optimization were specifically for this case. Thus I'm learning a new thing today. Is there a reference to check the details?
sukru
+6  A: 

PS: I'm not forcing myself to using vector, any other container that performs equally well and can prevent the potential large copy operation is fine by me.

Well, if you use a list and call a.splice(a.end(), b); you'll avoid the copy operation completely. A list is generally going to be a linked list rather than an array as is the case with a vector, so this has a lot of performance and usage implications. But splice runs in O(1), so that's a nice benefit.

Brian
http://stackoverflow.com/questions/1905417/array-vs-vector-vs-list
KennyTM
A: 

How about a helper function?

template<class T>
std::vector<T>& VectorAppend(std::vector<T> &target, const std::vector<T> &source)
{
    size_t insertPos = target.size();
    target.resize(target.size() + source.size());
    std::copy(source.begin(), source.end(), target.begin() + insertPos);
    return target;
}
David Gladfelter
What's wrong with a simple `std::copy(source.begin(), source.end(), std::back_inserter(target));` instead? If you're trying to optimize allocation, you could always call `target.reserve(target.size()+source.size());` before-hand.
sbi
+6  A: 

If you're in the position to change scanDir, make it a (template) function accepting an output iterator:

template <class OutIt>
void scanDir(const std::string& dirname, OutIt it) {
  // ...
  // Scan subdir
  scanDir(subdir, it);
  // ...
}

You'll have the additional benefit to be able to fill all sort of data structures like

std::vector<string> vector;
scanDir(dir1, std::back_inserter(vector));
std::set<string> fileset
scanDir(dir1, std::inserter(fileset, fileset.begin()));

etc.

EDIT (see comment ...)

For using this function for class member initialization, you could either call it in the constructor as in

class MyClass {
private:
  std::vector<string> m_fileList;
public:
  MyClass(const std::string& dirname) {
    scanDir(dirname, std::back_inserter(m_fileList);
  }
}

or using a wrapper function

std::vector<string> scanDir(const std::string& dirname) {
  std::vector<string> result;
  scanDir(dirname, std::back_inserter(result);
  return result;
}

class MyClass {
// Same as above..
  MyClass(const std::string& dirname) : m_fileList(scanDir(dirname)) { }
}

I would prefer the first version for performance (and other) reasons ...

MartinStettner
Could I use such a template to initialize a class data member (which holds the file list?)... oh wait, yours combined with sukru's is pretty powerful (second function to create the return value).
rubenvb
Just wrote an edit about this. I would suggest calling the function explicitely in the constructor, I cannot see any drawbacks of this approach ...
MartinStettner
It's generally called `OutputIterator` rather than `InsertIterator`.
Matthieu M.
@Matthieu Merci beaucoup! I've changed this in the response ...
MartinStettner
@Martin: Thanks for the great suggestion. I've got to start thinking more `iterator` than container. After some trouble splitting template declaration and implementation, I've got it right.
rubenvb
A: 

Use std::list and append by using std::list::splice.

From the docs for splice:

The operation does not involve the construction or destruction of any element object and, except for the third version, it is performed in constant time.

cape1232
+1  A: 
vector<string> fileList;
vector<string> temp( scanDir(subdirname) );

fileList.insert(fileList.end(), temp.begin(), temp.end());

I hope that helped you.

virious
insert is a linear time operation. The question specifically said your suggestion was not what he's looking for. "I fear that storing the return value and inserting it in fileList would bring performance badness."
cape1232
"I fear", for me it means that he is not sure if this will cause performance badness. To be honest, I don't think that using insert() in this case will cause any performance issues. Anyways, thanks for the minus -,-.
virious
This answers the title of the question white well. So it should be easy to find this solution, when peole like me come here in search of a solution for just that.
kuester2000
A: 

This might not be the simplest possible solution, but what about doing something equivalent to C#'s StringBuilder?

Make a list<vector<string> > then you can append all of the vectors that you get from your calls to scanDir() to the list.

If you absolutely have to have a single vector at the end, you can then, once make a new vector, allocate it large enough so that it won't need to resize, and assemble your finished product.

Alternatively, you could make a new class (if needed that derives from vector<T>) and internally uses the list<vector<T> > to store the elements. Then you would just make your iterators iterate through the elements in the first list, then when it reaches the end go for the elements in the next list, only returning container::end when you reached the end of the last list.

Andrew Shelansky
This is the most complicated answer I could ever not think of...
rubenvb
A: 
Gabi Purcaru
That first question is wrong. It should be: __Why use a global variable?__ There are other possibilities, after all, just look at the other answers. This indeed looks dirty, and the fact that you have used it does not improve a bit on its dirty look.
sbi
A: 

I know this doesn't answer your question directly, but with regard to your underlying goal, you might want to simply reimplement your function in terms of boost::filesystem. The directory iterator is already recursive so you don't need to make recursive calls of your own. You could simply populate a list in a loop over the iterator. There is an example implementation of ls: http://www.boost.org/doc/libs/1_43_0/libs/filesystem/example/simple_ls.cpp

You also get the added benefit of (theoretical) platform independence, relatively wide adoption (bugs get ferreted out more quickly with more adoption), etc

frankc