ansaurus

Question

Extracting, then passing raw data into another class - How to avoid copying twice while maintaining encapsulation?

Answer 1

+2 A:

use std::vector resize member to set the buffer size originally, and then use its buffer directly by using front()'s address.

std::vector<char> v;
v.resize(size);
strcpy(&v.front(), "testing");

Direct buffer access of std::vector is given by: &v.front()

Brian R. Bondy 2010-03-25 00:23:23

You're saying I should make the `Page::vector<char>` public?

Kache4 2010-03-25 00:25:00

Brian R. Bondy 2010-03-25 00:26:35

Note: You can also use vector's constructor instead of resize.

Brian R. Bondy 2010-03-25 00:30:02

Hmm, okay. It seems my only option will seem to lose some of the safety of encapsulation. `Page` will have to trust outside classes to not overflow the buffer, among other things.

Kache4 2010-03-25 00:31:18

Yes if you want to write directly to the buffer for efficiency, you have a little less security in that you have to trust those you give access to the buffer.

Brian R. Bondy 2010-03-25 00:33:40

Kind of related - `deque<Page> Book::pages_;` is populated in a loop: `for() { Page thisPage(params); char* dataGoesHere = thisPage.rawAddr(); readTo(dataGoesHere); pages_.push_back(thisPage); }` Is the `push_back` doing yet another hard copy via stl copy constructor, or is the `vector<char> Page::rawData_;` in the heap the whole time?

Kache4 2010-03-25 00:57:56

@Kache4: I think so but post as a new question

Brian R. Bondy 2010-03-25 01:16:31

Answer 2

A:

You can introduce a third component that would hold all the images. The book would fill it in, the pages would read from it. If you want to restrict access you can close it down and make book and page its friends. If you have repetitions of images (say, each page has a footer and a header, or some pages have a logo, or whatever) you can make that third component a flyweight, thus making it even more efficient than what you strived for.

Make sure that you don't open all the pages when you open the book. That can be expensive. Have each page hold an identifier for its images (perhaps file-paths), and load the images only when you really want to view the page.

wilhelmtell 2010-03-25 00:45:48

This is an interesting idea. All of the raw data is managed and encapsulated in one place, and interacted with from outside through Book and Page. What did you mean by flyweight? I was considering having a low-res and/or thumbnail for the images as well, and I was thinking about how to design for that too.

Kache4 2010-03-25 01:31:20

See http://en.wikipedia.org/wiki/Flyweight_pattern for information about the flyweight design pattern. In essence the idea is to share as much information as possible among a pool of objects (in your case, the pages of a book) in order to save memory. So, for instance, all pages will have a header and a footer but because these are identical among pages the pages will share these objects. If the pages don't own these objects (as is the case when you pull this information out to a third component) then you don't even have the problem of who should destroy this data when you're done with it.

wilhelmtell 2010-03-25 01:59:02

Answer 3

+2 A:

Use std::vector to hold image data is a bad idea. I will use raw pointer or shared_ptr for this purpose. This prevents the buffer is copied twice.

Since you do care memory, holding all image data in memory is also a bad idea to me. A better case is to encapsulate it into a separate class. For example, ImageData. This class contains the row pointer of image data. The class can be initialized with a file path at the first, and the image data is loaded from disk when it is required.

Sherwood Hu 2010-03-25 01:07:47

But then `Book` would have a `stl::deque` of ImageData, and ImageData would then have to define a copy constructor where the data is hard copied anyway, right?

Kache4 2010-03-25 01:28:43

Answer 4

+3 A:

In terms of code changes based on what you have already, the simplest is probably to give Page a setter taking a non-const vector reference or pointer, and swap it with the vector contained in the Page. The caller will be left holding an empty vector, but since the problem is excessive copying, presumably the caller doesn't want to keep the data:

void Book::addPage(ifstream file, streampos size) {
    std::vector<char> vec(size);
    file.read(&vec[0], size);
    pages.push_back(Page()); // pages is a data member
    pages.back().setContent(vec);
}

class Page {
    std::vector<char> content;
public:
    Page() : content(0) {} // create an empty page
    void setContent(std::vector<char> &newcontent) {
        content.swap(newcontent);
    }
};

Some people (for example the Google C++ style guide) want reference parameters to be const, and would want you to pass the newcontent parameter as a pointer, to emphasise that it is non-const:

void setContent(std::vector<char> *newcontent) {
    content.swap(*newcontent);
}

swap is fast - you'd expect it just to exchange the buffer pointers and sizes of the two vector objects.

Alternatively, give Page two different constructors: one for the zip file and one for the regular file, and have it be responsible for reading its own data. This is probably the cleanest, and it allows Page to be immutable, rather than being modified after construction. But actually you might not want that, since as you've noticed in a comment, adding the Page to a container copies the Page. So there's some benefit to being able to modify the Page to add the data after it has been cheaply constructed in the container: it avoids that extra copy without you needing to mess with containers of pointers. Still, the setContent function could just as easily take the file stream / zip file info as take a vector.

You could find or write a stream class which reads from a zipfile, so that Page can be responsible for reading data with just one constructor taking a stream. Or perhaps not a whole stream class, perhaps just an interface you design which reads data from a stream/zip/rar into a specified buffer, and Page can specify its internal vector as the buffer.

Finally, you could "mess with containers of pointers". Make pages a std::vector<boost::shared_ptr<Page> >, then do:

void Book::addPage(ifstream file, streampos size) {
    boost::shared_ptr<Page> page(new Page(file, size));
    pages.push_back(page); // pages is a data member
}

A shared_ptr has a modest overhead relative to just a Page (it makes an extra memory allocation for a small node containing a pointer and a refcount), but is much cheaper to copy. It's in TR1 too, if you have some implementation of that other than Boost.

Steve Jessop 2010-03-25 01:23:54

Oh, nice, I didn't think about using swap. I thought about having Page handle the data reading. The thing with that I'd be constantly opening a zipfile, jumping to and reading 1 image, and closing it. Even worse, the unrar library doesn't support jumping to items.

Kache4 2010-03-25 01:38:46

"I'd be constantly opening a zipfile, jumping to and reading 1 image, and closing it" - not necessarily. Presumably, your `Book` class currently advances its way through the zip/rar reading one file at a time. The `Page` constructor could have the same behaviour, that it "consumes" some data from the object passed to it. Just be careful handling errors - because you've read some data and there's probably no way to put the stream position back to where it started (if you can't jump to a file, you can't seek), you can only offer the weak exception guarantee.

Steve Jessop 2010-03-25 01:49:42

Answer 5

+1 A:

I would have the Page class read its own data directly from the source, and the Book would only read as much of the source that it needed in order to locate each individual page (and to read any data belonging to the Book in general, such as a title).

For example, in the case of the data being stored in a directory, the Book would retrieve the list of files in the directory. For each file, it would pass the filename to a Page constructor which would open the file and load its contents.

As for the case where the book was stored in a zip file, I'm making some guesses as to how the library you're using works. I think you're using Minizip, which I'm not familiar with, but at a glance it looks like opening a file via Minizip gives you a handle. You pass that handle to unzGoToFirstFile() and unzGoToNextFile() to set the active subfile within the zip file (in your case, the active page), and use unzReadCurrentFile() to load the active subfile into a buffer. If that's the case, then your Book class would open the file using Minizip and set it to the first subfile. It would then pass the handle to the zip file to a constructor in Page, which would do the work of reading the subfile from the zip file. The Book would then call unzGoToNextFile() to move to the next subfile, and would create another page by again passing the handle to Page. It would continue doing this until there were no subfiles remaining. It would look something like:

Page::Page(zipFile file)
{
    //  TODO:  Determine the required size of the buffer that will store the data
    unsigned buffer_size;

    data_.resize(buffer_size)

    unzReadCurrentFile(file, &data_[0], buffer_size);
}

void Book::open(const std::string &filename)
{
    zipFile file = unzOpen(filename.c_str());

    int result = unzGoToFirstFile(file);
    while (result == UNZ_OK)
    {
        pages_.push_back(Page(file));
        unzGoToNextFile(file);
    }
}

This is very simplified (and I might be using Minizip totally wrong, so beware), and it also assumes that Book stores a vector of Page objects named pages_, and that Page names its buffer data_.

Josh Townzen 2010-03-25 01:42:42

The thing is, a Book is made of many Pages of screenshots, all within a zip or rar file. Conceptually, the Book should own the zip or rar. However, I think I may be able to pass a reference for the zip/rar handle to each Page and have each Page locate and extract the raw data it needs. Alternatively, I could store a reference to the Book in every page, so that each page can decide on its own when it needs to reach out and grab the zip/rar handle as well.

Kache4 2010-03-29 05:55:38

Yes, that was what I thinking; the Book would open the container (a directory, a ZIP file, a RAR file, etc) and pass that container's handle to a Page constructor, which would then load a single page from the container. I've thought about this some more, though, and that might not be a good idea. Every time you wanted to add a new container type, you would have to update both Book and Page. Both would be responsible for knowing how each container works. It would probably be a better idea to have the Book do all of the loading and pass the loaded data to the Page, per Steve Jessop's answer.

Josh Townzen 2010-03-30 02:36:47

ansaurus

tags:

views:

answers:

Extracting, then passing raw data into another class - How to avoid copying twice while maintaining encapsulation?

related questions