tags:

views:

149

answers:

4

Hi,

I'm trying to join two big files (like the UNIX cat command: cat file1 file2 > final) in C++.

I don't know how to do it because every method that I try it's very slow (for example, copy the second file into the first one line by line)

¿What is the best method for do that?

Sorry for being so brief, my english is not too good

+4  A: 

Use binary-mode in the standard streams to do the job, don't deal with it as formatted data.


This is a demo if you want transfer the data in blocks:

#include <fstream>
#include <vector>

std::size_t fileSize(std::ifstream& file)
{
    std::size_t size;

    file.seekg(0, std::ios::end);
    size = file.tellg();
    file.seekg(0, std::ios::beg);

    return size;
}

int main()
{
    // 1MB! choose a conveinent buffer size.
    const std::size_t blockSize = 1024 * 1024;

    std::vector<char> data(blockSize);
    std::ifstream first("first.txt", std::ios::binary),
       second("second.txt", std::ios::binary);
    std::ofstream result("result.txt", std::ios::binary);
    std::size_t firstSize  = fileSize(first);
    std::size_t secondSize = fileSize(second);

    for(std::size_t block = 0; block < firstSize/blockSize; block++)
    {
     first.read(&data[0], blockSize);
     result.write(&data[0], blockSize);
    }

    std::size_t firstFilerestOfData = firstSize%blockSize;

    if(firstFilerestOfData != 0)
    {
     first.read(&data[0], firstFilerestOfData);
     result.write(&data[0], firstFilerestOfData);
    }

    for(std::size_t block = 0; block < secondSize/blockSize; block++)
    {
     second.read(&data[0], blockSize);
     result.write(&data[0], blockSize);
    }

    std::size_t secondFilerestOfData = secondSize%blockSize;

    if(secondFilerestOfData != 0)
    {
     second.read(&data[0], secondFilerestOfData);
     result.write(&data[0], secondFilerestOfData);
    }

    first.close();
    second.close();
    result.close();

    return 0;
}
AraK
And use a large blocksize (several kb at least)
bdonlan
+2  A: 

Using plain old C++:

#include <fstream>

std::ifstream file1("x", ios_base::in | ios_base::binary);
std::ofstream file2("y", ios_base::app | ios_base::binary);
file2 << file1.rdbuf();

The Boost headers claim that copy() is optimized in some cases, though I'm not sure if this counts:

#include <boost/iostreams/copy.hpp>
// The following four overloads of copy_impl() optimize 
// copying in the case that one or both of the two devices
// models Direct (see 
// http://www.boost.org/libs/iostreams/doc/index.html?path=4.1.1.4)

boost::iostreams::copy(file1, file2);

update:

The Boost copy function is compatible with a wide variety of types, so this can be combined with Pavel Minaev's suggestion of using std::filebuf like so:

std::filebuf file1, file2;

file1.open("x", ios_base::in | ios_base::binary);
file2.open("y", ios_base::app | ios_base::binary);

file1.setbuf(NULL, 64 * 1024);
file2.setbuf(NULL, 64 * 1024);

boost::iostreams::copy(file1, file2);

Of course the actual optimal buffer size depends on many variables, 64k is just a wild guess.

Tim Sylvester
Note that << and >> operators are primarily used for formatted i/o and therefore are not as efficient as using read and write with binary data.
RC
Normally, yes, but `rdbuf()` returns a `streambuf*`, and I'm pretty sure the relevant `operator<<` overload does a direct copy with no formatting.
Tim Sylvester
A: 

As an alternative which may or may not be faster depending on your file size and memory on the machine. If memory is tight, you can make the buffer size smaller and loop over the f2.read grabbing the data in chunks and writing to f1.

#include <fstream>
#include <iostream>

using namespace std;

int main(int argc, char *argv[])
{
        ofstream f1("test.txt", ios_base::app | ios_base::binary);
        ifstream f2("test2.txt");

        f2.seekg(0,ifstream::end);
        unsigned long size = f2.tellg();
        f2.seekg(0);

        char *contents = new char[size];
        f2.read(contents, size);
        f1.write(contents, size);

        delete[] contents;
        f1.close();
        f2.close();

        return 1;
}

RC
+6  A: 

If you're using std::fstream, then don't. It's intended primarily for formatted input/output, and char-level operations for it are slower than you'd expect. Instead, use std::filebuf directly. This is in addition to suggestions in other answers, specifically, using the larger buffer size.

Pavel Minaev
+1 for the std::filebuf
AraK