tags:

views:

590

answers:

12

Hi all,

I'm fairly new to C++ so this is probably somewhat of a beginner question. It regards the "proper" style for doing something I suspect to be rather common.

I'm writing a function that, in performing its duties, allocates memory on the heap for use by the caller. I'm curious about what a good prototype for this function should look like. Right now I've got:

int f(char** buffer);

To use it, I would write:

char* data;
int data_length = f(&data);
// ...
delete[] data;

However, the fact that I'm passing a pointer to a pointer tips me off that I'm probably doing this the wrong way.

Anyone care to enlighten me?

+2  A: 

Why not do the same way as malloc() - void* malloc( size_t numberOfBytes )? This way the number of bytes is the input parameter and the allocated block address is the return value.

UPD: In comments you say that f() basically performs some action besides allocating memory. In this case using std::vector is a much better way.

void f( std::vector<char>& buffer )
{
    buffer.clear();
    // generate data and add it to the vector
}

the caller will just pass an allocated vector:

 std::vector buffer;
 f( buffer );
 //f.size() now will return the number of elements to work with
sharptooth
Because only the callee knows what the size will be.
Joe Snikeris
@Joe: And how would you want this logic to work? How much memory would you like to be allocated and who would need its size after that?
sharptooth
The amount of memory to be allocated depends on what f() is doing. In this case, f() is taking a screenshot, so only f() knows how big the buffer needs to be.
Joe Snikeris
In the general sense, you are externalising the size of the buffer. The OP in the general does not have the buffer size as a parameter, so you are rewriting the specification of the question to make it fit the answer...
polyglot
sharptooth
There is intrinsic dereference in a pointer type that makes copy cheap as compared to a vector, for instance you can't store these vectors in a map without hitting performance constraints. It is not necessarily always correct. In such a case you would need vector<char> pointers, and you are back where you started.. :)
polyglot
A: 

Just return the pointer:

char * f() {
   return new char[100];
}

Having said that, you probably do not need to mess with explicit allocation like this - instead of arrays of char, use std::string or std::vector<char> instead.

anon
The caller won't know how big the buffer is.
Joe Snikeris
In this case they will, because it is part of the specification of the function. In the general case, that is another reason for using strings or vectors, which carry their size around with them.
anon
+1  A: 

Pass the pointer by reference...

int f(char* &buffer)

However you may wish to consider using reference counted pointers such as boost::shared_array to manage the memory if you are just starting this out.

e.g.

int f(boost::shared_array<char> &buffer)
polyglot
I think this is what I'm looking for. I'm going to think it over a bit more before deciding that this is the answer.
Joe Snikeris
What's the advantage of the `shared_array` vs. `std::vector`?
sbi
In practice, it calls delete [] ptr instead of delete ptr, which is needed in this case because the char* array needs to be deleted in this manner.
polyglot
@polyglot: And `std::vector` wouldn't do this?
sbi
std::vector<> heap allocates, but it copies by value which is a fundamental difference to a reference type. You can't store vectors in a map because of this reason without performance implications. The copy semantics are the fundamental difference.
polyglot
Ok, this was really what I was looking for at the time, but the accepted answer describes the "proper" way to do what I was intending, and since I mentioned "proper" in the question...
Joe Snikeris
@polyglot? What's with the irrational fear of copying? A map doesn't copy its data members around anyway. So yes, you can easily store a vector in a map. Of course there are situations where it'll be a bad idea, but in most cases, (if you don't plan to add/remove/reassign each vector a bajillion times), it would be a perfectly valid choice.
jalf
@jalf, are you saying that map<x,y> does not copy instances of y upon assignments? In this case, for instance, the OP is going to construct a screen sized bitmap in his vector. Suppose he stored it in a map<string, vector<char>>. Do you think inadvertent copies or should not be worried about? That is a lot of unnecessary heap operations.
polyglot
@polyglot: So it's not the deletion of the data that makes the difference (as there is none), but the fact that `shared_array` implements shared ownership. That's quite a difference -- unless you don't copy the `vector`.
sbi
@polyglot: Of course copying a map does copy its content. However, if you're so worried about copying 'vector', why are you thinking about copying 'map'?
sbi
I am saying that a vector is not a dereferenced type! The semantics are fundamentally different to a pointer to memory! What is so difficult to understand about that? Insertion and retrieval will pay significant copy penalties, particularly if the byte buffer is sizeable.
polyglot
+4  A: 

In 'proper' C++ you would return an object that contains the memory allocation somewhere inside of it. Something like a std::vector.

Zan Lynx
A: 

I guess you are trying to allocate a one dimensional array. If so, you don't need to pass a pointer to pointer.

int f(char* &buffer)

should be sufficient. And the usage scenario would be:

char* data;
int data_length = f(data);
// ...
delete[] data;
erelender
This just copies the (in this case: random) original value of `data` into the function, but it won't change the pointer at the caller's side to the memory allocated inside the function.
sbi
This won't work.f() will contain something like: buffer = new char[10];This changes the local copy of buffer not the value of data. I believe data is still uninitialized.
Joe Snikeris
How does that work? The memory allocated in the function f will get lost here. Try it. The problem arises from the fact that you are passing in a pointer (a 32 or 64-bit integer). A value is filled in ovre that passed in char*. However at the end of the stack frame that value is lost. By passing in a pointer to a pointer you are passing the address of this 32 or 64-bit integer value. That way when you fill it in and return you are still pointing at the area of memory that NOW contains the correct pointer.
Goz
That is awful advice! I don't mean to be overtly mean but I lolled.
polyglot
I guess i am tired from work. How did i do that simple mistake? I am really sorry and utterly ashamed.
erelender
Happens to all of us. :)
sbi
A: 

Provided that f does a new[] to match, it will work, but it's not very idiomatic.

Assuming that f fills in the data and is not just a malloc()-alike you would be better wrapping the allocation up as a std::vector<char>

void f(std::vector<char> &buffer)
{
    // compute length
    int len = ...

    std::vector<char> data(len);

    // fill in data
    ...

    buffer.swap(data);
}

EDIT -- remove the spurious * from the signature

Steve Gilham
Why are you passing a pointer to a vector?
sbi
Steve Gilham
A: 

Actually, the smart thing to do would be to put that pointer in a class. That way you have better control over its destruction, and the interface is much less confusing to the user.

class Cookie {
public:
   Cookie () : pointer (new char[100]) {};
   ~Cookie () {
      delete[] pointer;
   }
private:
   char * pointer;

   // Prevent copying. Otherwise we have to make these "smart" to prevent 
   // destruction issues.
   Cookie(const Cookie&);
   Cookie& operator=(const Cookie&);
};
T.E.D.
If you copy this object, it attempts to delete the same piece of memory twice. Urgh.
sbi
Look again. I was in the process of adding that, sbi. :-)
T.E.D.
Ah, in that case I'll remove my downvote. :)
sbi
Dammit, the thing says it's too old to be changed. (Unless you edited it -- what you did! Sorry.)
sbi
+1  A: 

Use RAII (Resource Acquisition Is Initialization) design pattern.

http://en.wikipedia.org/wiki/RAII http://stackoverflow.com/questions/712639/please-help-us-non-c-developers-understand-what-raii-is

alex vasi
A: 

If all f() does with the buffer is to return it (and its length), let it just return the length, and have the caller new it. If f() also does something with the buffer, then do as polyglot suggeted.

Of course, there may be a better design for the problem you want to solve, but for us to suggest anything would require that you provide more context.

Ari
+3  A: 

Your function should not return a naked pointer to some memory. The pointer, after all, can be copied. Then you have the ownership problem: Who actually owns the memory and should delete it? You also have the problem that a naked pointer might point to a single object on the stack, on the heap, or to a static object. It could also point to an array at these places. Given that all you return is a pointer, how are users supposed to know?

What you should do instead is to return an object that manages its resource in an appropriate way. (Look up RAII.) Give the fact that the resource in this case is an array of char, either a std::string or a std::vector seem to be best:

int f(std::vector<char>& buffer);

std::vector<char> buffer;
int result = f(buffer);
sbi
Shouldn't the vector be passed by pointer so it is clear to someone reading: int result = f(buffer);that buffer may be changed?
Joe Snikeris
polyglot
Joe Snikeris
Passing it by reference is the right choice here. It clearly indicates that the vector may get modified (data may be added to it, for example), but the vector does not get replaced, or set to NULL.
jalf
@Joe: but once it is a pointer, you also open op the possibility that it might be null. If I see a function accepting a pointer, I assume it is because it either 1) is an optional argument, which may be NULL, or 2) because the function we're calling is might reassign it. Neither of these are true in this case. And most IDE's offer intellisense to help you figure out the prototype if you're in doubt. And of course, the function should be named so you know what it does, and you'll know that it modifies its argument.
jalf
Suppose f was declared: int f(std::vector<char> buffer);Then f(buffer); has a completely different semantics despite having the same syntax.
Joe Snikeris
Yes, and? ;)There are several factors to help us distinguish the two cases. First, we can look at the function prototype. Second, we can use intellisense to instantly find out, at the call-site, what the argument type is, and third, we can look at the name of the function, and make a shrewd guess based on that. If the function is called GetScreenshotData, then it's pretty clear that it *gets* data. So it must be passed to the argument. It's just not usually a problem. And using pointers instead opens up a whole different set of ambiguities instead. Is it null? Can we change what it points to?
jalf
I'm still not convinced on this pointer vs reference issue. Your first question (Is it null?) is really the only issue. To answer your second question: yes. Passing a pointer implies that what is pointed to can change. Passing by value or const reference (same syntax) implies that it won't change. Making the interface to a function unambiguous at the expense of making the implementation slightly more complicated is worth it every time.
Joe Snikeris
but it's not unambiguous, is my point. You're introducing entirely new semantics that have to be considered (is it null?) You're just trading a common, idiomatic form of syntactic ambiguity for another, imo much nastier, semantic ambiguity. The difference is that the reference question can be easily resolved (read the function prototype, see that it takes a reference), but to resolve the pointer semantics, you have to read the entire function definition (what does it do if the argument is null? Is it allowed to be null?)
jalf
@Joe: This gets often asked by people coming from C to C++. The C++ answer to this is: If you don't know whether `f()` takes it argument by copy/const reference (from the caller's POV pretty much the the same) or per non-const reference, you shouldn't be calling it. Also, picking the right name for the function should get this across, too. While we cannot stop you from doing it the C way (passing pointers), the C++ world is passing objects to be changed per non-const reference.
sbi
I would prefer the vector as a return value. The difference being that if you pass the vector by reference the signature can be either that of a in/out parameter or an out parameter. The signature is not clearly saying that there are no requirements on the passed in vector. How will the function behave when an empty/non-empty vector is being passed in? Will it clear previous memory or just append? When you return a vector, it is clear that the object does not exist before the function call, and as such it can only have that memory assigned inside the method. The interface is clearer.
David Rodríguez - dribeas
@dribeas: Of course, passing the vector in by reference is probably a premature optimization. It might be small enough (or called very seldom) to not to make a difference. Also, simply recompiling this with a C++1x-conforming compiler (supporting rvalue references) will eliminate the overhead.
sbi
A: 

The proper style is probably not to use a char* but a std::vector or a std::string depending on what you are using char* for.

About the problem of passing a parameter to be modified, instead of passing a pointer, pass a reference. In your case:

int f(char*&);

and if you follow the first advice:

int f(std::string&);

or

int f(std::vector<char>&);
AProgrammer
+5  A: 

In C, that would have been more or less legal.

In C++, functions typically shouldn't do that. You should try to use RAII to guarantee memory doesn't get leaked.

And now you might say "how would it leak memory, I call delete[] just there!", but what if an exception is thrown at the // ... lines?

Depending on what exactly the functions are meant to do, you have several options to consider. One obvious one is to replace the array with a vector:

std::vector<char> f();

std::vector<char> data = f();
int data_length = data.size();
// ...
//delete[] data;

and now we no longer need to explicitly delete, because the vector is allocated on the stack, and its destructor is called when it goes out of scope.

I should mention, in response to comments, that the above implies a copy of the vector, which could potentially be expensive. Most compilers will, if the f function is not too complex, optimize that copy away so this will be fine. (and if the function isn't called too often, the overhead won't matter anyway). But if that doesn't happen, you could instead pass an empty array to the f function by reference, and have f store its data in that instead of returning a new vector.

If the performance of returning a copy is unacceptable, another alternative would be to decouple the choice of container entirely, and use iterators instead:

// definition of f
template <typename iter>
void f(iter out);

// use of f
std::vector<char> vec;
f(std::back_inserter(vec));

Now the usual iterator operations can be used (*out to reference or write to the current element, and ++out to move the iterator forward to the next element) -- and more importantly, all the standard algorithms will now work. You could use std::copy to copy the data to the iterator, for example. This is the approach usually chosen by the standard library (ie. it is a good idea;)) when a function has to return a sequence of data.

Another option would be to make your own object taking responsibility for the allocation/deallocation:

struct f { // simplified for the sake of example. In the real world, it should be given a proper copy constructor + assignment operator, or they should be made inaccessible to avoid copying the object
  f(){
    // do whatever the f function was originally meant to do here
    size = ???
    data = new char[size];
  }
  ~f() { delete[] data; }

int size;
char* data;
};

f data;
int data_length = data.size;
// ...
//delete[] data;

And again we no longer need to explicitly delete because the allocation is managed by an object on the stack. The latter is obviously more work, and there's more room for errors, so if the standard vector class (or other standard library components) do the job, prefer them. This example is only if you need something customized to your situation.

The general rule of thumb in C++ is that "if you're writing a delete or delete[] outside a RAII object, you're doing it wrong. If you're writing a new or `new[] outside a RAII object, you're doing it wrong, unless the result is immediately passed to a smart pointer"

jalf
This is terrible advice, return the vector by copy? There is an unnecessary malloc and memcpy in your first snippet. I also think your 2nd example is an example of archaic C++ (i.e. pre-STL), one would not typically write it in this manner if one had a thorough knowledge of stl and boost.
polyglot
The second example is basically a step back, saying we have no clue what the f function was meant to do, because the OP didn't describe that. So perhaps its responsibilities mean that it could be better implemented as its own object. Of course, if the job is simply "return a fixed-size buffer", then the second method is downright silly. But we don't know that, and I simply wanted to show a more general solution. As to returning th evector by value, most compilers implement RVO and NRVO. Although again, that may not work depending on the (unknown) complexity of the f function.
jalf
If you were going to lug out the RVO/NVRO line you should have at least said! :):)
polyglot
ok, added that to my post. ;)
jalf
@jalf: Is the qualification you made one of those things that shouldn't be worried about unless there is a demonstrated need, e.g. profiling shows that this piece of code is slowing things down.I've often wondered about whether or not I should be returning non-fundamental types by copy. On the one hand, it makes the functions easier to use, but on the other, there is the performance issues that polyglot mentions.What should a newbie keep in mind while thinking about these things?
Joe Snikeris
@Joe: don't pass arrays by value in general. Just don't. RVO/NRVO shouldn't be a newbie consideration I don't think. – polyglot 18 secs ago
polyglot
Removed accepted answer. I think the first example should pass in a pointer to the vector.
Joe Snikeris
@Joe: It depends. As you say, it's simpler to just return a copy, and simplicity is always nice. My rule of thumb is that there's no harm in it in simple cases. In more complex cases, a better solution might be to pass in an output iterator the function can write data to, completely detaching the function from the particular container used. Perhaps I should add an example of that.
jalf
I like that idea.
Joe Snikeris
I cannot agree. Passing vectors by value is a code smell. When you have a project that goes to hundreds of thousands of lines, the little things end up burning you if you take shortcuts.
polyglot
Added a quick example of the iterator approach.Btw, in general there's no rush in accepting an answer. It usually pays to wait an hour or three to see what other answers pop up, or how existing ones get modified. Once an answer is accepted, it tends to discourage others from answering (and as in this case, comments might show up convincing you that the answer you accepted is flawed)
jalf
@polyglot: Bullshit. If you have a project of hundreds of thousands of lines, the time wasted on that kind of premature optimizations is what burns you. It is an easy thing to optimize later, if you find that the performance is a problem, and especially in a large project, we know that the vast majority of the code basically does not have a noticeable impact on performance. The usual 90/10 rule applies. If this is among the 90% of the code that isn't called often enough to matter performance-wise, don't bother, and focus your efforts on the 10% that *do* matter.
jalf
If you have hundreds of thousands of lines of code, those 10% are still tens of thousands lines. More than enough to keep you busy even if you don't first waste time on the 90% that don't matter.
jalf
It's also worth pointing out that there are a lot of mechanisms that can eliminate the copy. RVO/NRVO are obvious examples, but given how aggressive C++ compilers are at inlining, that's another obvious way in which the copy may end up getting eliminated. Calling it a "code smell" is just silly.
jalf
Adding to what jalf said: With C++0x^H^H1x, a compiler and std lib implementing rvalue references will make returning non-fundamental types as fast as `swap()`ing them. That's one more reason I'd argue to write the code as it is easiest to read and worry about performance only when and where it is necessary. In order to gain the 1.327% performance gain for not copying non-fundamental types in the 97% of the code where it is performance-irrelevant, I'd wait for my compiler's/std lib's next version to implement this, so I get it for free, instead of wasting so much time on so small a gain.
sbi
@jalf: what in my suggestion was any more lines of code? How was it extra labour? It was just the avoidance of copying vectors unnecessarily. Good rant but unsupported by the facts.
polyglot
@sbi: We aren't in C++0x land yet though are we? Do you actually program for a living? The issue is that extra heap operations are expensive. The buffer the original poster was talking of was in the many megabytes, so you are saying that making an unnecessary reallocation of several megabytes and then copy is good form. It aint.
polyglot
@polyglot: There are compilers today that implement rvalue refs, so in a way we are in C++0x land -- furthermore, his point was that it is trivial to make the rvalue optimization *when it becomes a problem*, and by then, we almost certainly will be in c++0x land.And more lines of code? When you *call* the function, obviously. You have to declare a local vector first, before you can pass it to the function, which is a pain if you wanted to nest the function call in another. Can no longer do that because the function doesn't *return* anything.
jalf
I don't know which question you're reading, but the one here says nothing about the size of the buffer being "many megabytes". And even if it did, it still wouldn't be a problem until profiling *shows* it to be a problem. Most likely the copy will never occur because of RVO. OR the function will get inlined and the copy eliminated. That is why people usually suggest profile before optimizing. You don't know that a copy even takes place.
jalf
@polyglot: As I said, 97% of these cases will add up to 1.327% of the system's overall performance. I haven't seen any statement of the OP regarding the size of the buffer, but if it indeed is that big, than this particular case _might_ fall under the 3% where it matters. (A possible exception could be if this only happens once a week.) But in general I won't worry too much about such details until I've measured that they matter. Spending my time on the 97% that don't matter would prevent me from caring where it matters and would have driven me out of my job (and this industry) a decade ago.
sbi
+1 on the vector part, not so much in the rest of options. Nowadays _all_ compilers implement RVO (unless the internal code has multiple code paths that create different temporaries to return). It is the only optimization with support from the standard.
David Rodríguez - dribeas
The good part of returning a vector is that the semantics are clear in the interface: you are creating and returning a vector, there is no other possibility
David Rodríguez - dribeas