views:

90

answers:

2

I was making sure I knew how to do the op= and copy constructor correctly in order to sort() properly, so I wrote up a test case. After getting it to work, I realized that the op= was hard-copying all the data_.

I figure if I wanted to sort a container with this structure (its elements have heap allocated char buffer arrays), it'd be faster to just swap the pointers around. Is there a way to do that? Would I have to write my own sort/swap function?

#include    <deque>
//#include  <string>
//#include  <utility>
//#include  <cstdlib>
#include    <cstring>
#include    <iostream>
#include    <fstream>
#include    <algorithm> // I use sort(), so why does this still compile when commented out?

#include    <boost/filesystem.hpp>
#include    <boost/foreach.hpp>

using namespace std;

namespace fs = boost::filesystem;

class Page
{
    public:
        // constructor
        Page(const char* path, const char* data, int size) :
            path_(fs::path(path)),
            size_(size),
            rawdata_(new char[size])
        {
//          cout    << "Creating Page..." << endl;
            strncpy(rawdata_, data, size);
//          cout    << "done creating Page..." << endl;
        }
        // copy constructor
        Page(const Page& other) :
            path_(fs::path(other.path())),
            size_(other.size()),
            rawdata_(new char[other.size()])
        {
//          cout    << "Copying Page..." << endl;
            strncpy(data_, other.data(), size_);
//          cout    << "done copying Page..." << endl;
        }
        // destructor
        ~Page() { delete[] data_; }
        // accessors
        const fs::path& path() const { return path_; }
        const char* data() const { return rawdata_; }
        int size() const { return size_; }
        // operators
        Page& operator = (const Page& other) {
            if (this == &other)
                return *this;
            char* newImage = new char[other.size()];
            strncpy(newImage, other.data(), other.size());
            delete[] data_;
            rawdata_ = newImage;
            path_ = fs::path(other.path());
            size_ = other.size();
            return *this;
        }
        bool operator < (const Page& other) const { return path_ < other.path();    }
    private:
        fs::path path_;
        int size_;
        char* rawdata_;
};

class Book
{
    public:
        Book(const char* path) :
            path_(fs::path(path))
        {
            cout    << "Creating Book..." << endl;
            cout    << "pushing back #1" << endl;
            // below, the RawData will be coming from methods like
            // fstream.read(char* buffer, int filesize); or
            // unzReadCurrentFile(unzFile zipFile, char* buffer, int size);
            pages_.push_back(Page("image1.jpg", "firstImageRawData", 17));
            cout    << "pushing back #3" << endl;
            pages_.push_back(Page("image3.jpg", "thirdImageRawData", 17));
            cout    << "pushing back #2" << endl;
            pages_.push_back(Page("image2.jpg", "secondImageRawData", 18));

            cout    << "testing operator <" << endl;
            cout    << pages_[0].path().string() << (pages_[0] < pages_[1]? " < " : " > ") << pages_[1].path().string() << endl;
            cout    << pages_[1].path().string() << (pages_[1] < pages_[2]? " < " : " > ") << pages_[2].path().string() << endl;
            cout    << pages_[0].path().string() << (pages_[0] < pages_[2]? " < " : " > ") << pages_[2].path().string() << endl;

            cout    << "sorting" << endl;
            BOOST_FOREACH (Page p, pages_)
                cout    << p.path().string() << endl;
            sort(pages_.begin(), pages_.end());
            cout << "done sorting\n";
            BOOST_FOREACH (Page p, pages_)
                cout    << p.path().string() << endl;

            cout    << "checking datas" << endl;
            BOOST_FOREACH (Page p, pages_) {
                char data[p.size() + 1];
                strncpy((char*)&data, p.data(), p.size());
                data[p.size()] = '\0';
                cout    << p.path().string() << " " << data << endl;
            }
            cout    << "done Creating Book" << endl;
        }

        const Page& getFirstPage() { return pages_[0]; }
    private:
        deque<Page> pages_;
        fs::path path_;
};

int main() {
    Book* book = new Book("/some/path/");
    // below is an example of where the rawdata is used
    // by a method that has a char* parameter
    ofstream outFile("outimage.jpg");
    outFile.write(book->getFirstPage().data(), book->getFirstPage().size());
}
A: 

I wouldn't use raw char * in this scenario as it's going to be an unnecessary headache. Use std::string instead, which will remove the need for the copy constructor, assignment operator and destructor as the compiler-generated ones will be sufficient.

If you then find that copying the data is still a major bottleneck, you could use a boost::shared_ptr to hold the string if you can live with the additional level of indirection in normal use. That way, the string will not be copied if the containing object is copied and you still get the safety of RAII.

Timo Geusch
It will get rid of the trio entirely, in fact.
GMan
Yes, you're correct - edited post to reflect that.
Timo Geusch
(Kinda repeating myself from a comment from above) Should I be using std::string for file streams? How should I handle an unzip and unrar library that asks for a raw char pointer then?Also, if I wanted to avoid the copying in the sort(), `boost::shared_ptr` can do that? (I've never used it before.) I read a bit about it. It creates a small pointer-like object that I'll be able to put inside my `Page`s that'll continue to point to the right heap even after the `Page` is moved/sorted?
Kache4
`std::string` allows for read access as a C string via the c_str() method. If you're managing an array of chars instead of a string, consider `vector<char>` as mentioned above. `boost::shared_ptr` uses reference counting to determine the lifecycle of the object it points to. The only data that gets copied is what little data shared_ptr uses instead of the whole string so *if* the data copy is a major issue then this approach might improve performance.
Timo Geusch
well I know c_str() can convert strings to c strings, but I'm talking about using methods that have `char*` parameters to fill, like `strncpy(char*, const char*, int)` or `fstream.read(char*, int)`. I'm used to creating a `char[size] buffer;` for those methods to "fill". How would I use a `std::string` without using an intermediate char array?
Kache4
You don't, but you could use a vector<char>
Timo Geusch
Kache4
If you're using a vector<char> as a member in the object, the whole vector will be copied if the object will be copied. It won't be as explicit as calling `memcpy` but the effect will be the same. The reason for using `shared_ptr` in either scenario is that you add a level of indirection to the character array/string and only copy the pointer object to it rather than the whole object.
Timo Geusch
A: 

If using manual char* manipulation isn't part of your criteria for the exercise, you could use std::string and let it handle all the allocation issues for you. The std::swap function used by std::sort is even specialized to call std::string::swap for you, which means that it automatically only swaps the pointers to your string data rather than deep-copying.

If you want to use char* for exercise purposes, probably the easiest way to create a free-standing swap function that takes two Page references and just swaps the internal data pointers around. I believe that as long as sort can see a better match than the standard template, it will call your function instead getting the increased performance.

Finally, to answer your question about the header : Compilers are free to implement headers as actual files that include other headers (even ones that might not normally be expected). Almost certainly your iostream header is including algorithm directly or indirectly. On another compiler your code might fail to compile.

Mark B
Funny thing is, when I learned to do this kinda stuff, I learned it in this way *because* they were the exercises I had to do. So the functions that give me the streams I want, like `myFstream.read(buffer, filesize)` want a `char*` for the first parameter. How would I put a `std::string` there instead, without using an intermediate `char*` array?
Kache4