tags:

views:

106

answers:

5

Hello everyone,

Many of my programs output huge volumes of data for me to review on Excel. The best way to view all these files is to use a tab deliminated text format. Currently i use this chunk of code to get it done:

ofstream output (fileName.c_str());
for (int j = 0; j < dim; j++)
{
    for (int i = 0; i < dim; i++)
        output << arrayPointer[j * dim + i] << " ";
    output << endl;
}

This seems to be a very slow operation, is a more efficient way of outputting text files like this to the hard drive?

Update:

Taking the two suggestions into mind, the new code is this:

ofstream output (fileName.c_str());
for (int j = 0; j < dim; j++)
{
    for (int i = 0; i < dim; i++)
        output << arrayPointer[j * dim + i] << "\t";
    output << "\n";
}
output.close();

writes to HD at 500KB/s

But this writes to HD at 50MB/s

{
    output.open(fileName.c_str(), std::ios::binary | std::ios::out);
    output.write(reinterpret_cast<char*>(arrayPointer), std::streamsize(dim * dim * sizeof(double)));
    output.close();
}
+3  A: 

Don't use endl. It will be flushing the stream buffers, which is potentially very inefficient. Instead:

output << '\n';
anon
I suppose its faster, but its still unreasonably slow compared to a straight write of a double array straight to hard drive. Checking the hard drive usage, my program writes at a rate of 500KB/s when outputting tab delimintated files while when outputting straight binary data it goes to 50MB/s. why is it so slow?
Faken
Formatted output will always be considerably slower than raw output, but not normally by a factor of 10x (assuming you got your MB and KB mixed up in your comment) - there is probably something else going on.
anon
I checked again, it jumps around a little but the disc usage in write is somewhere around 30-45MB/s. I'm very sure its not my CPU to blame, its a Q6600 2.4ghz. I have a feeling that the text file is being chopped up into small little bits and being written to HD in small chunks rather than one massive buffer. Is there any way to fix that?
Faken
You could try writing everything to a stringstream instead of a file stream and then output the stringstream in one chunk. This will take a lot more memory than using a file stream, and may or may not be faster. Alternativel, scrap C++ stream output and use C streams and fprintf() - this will almost certainly be faster than iostreams.
anon
A: 
ofstream output (fileName.c_str());
for (int j = 0; j < dim; j++)
{
    for (int i = 0; i < dim; i++)
        output << arrayPointer[j * dim + i] << '\t';
    output << endl;
}

Use '\t' instead of " "

ratnaveer
+1  A: 

does it have to be written in C? if not, there are many tools already written in C, eg (g)awk (can be used in unix/windows) that does the job of file parsing really well, also on big files.

awk '{$1=$1}1' OFS="\t" file
ghostdog74
Sorry, I'm not familiar with this, what is this?
Faken
+1 Had the same idea.
Helper Method
He seems to want to speed up the output of a program already written in C++. I don't see how awk can help do that.
anon
@Faken, its a common *nix tool used for file processing. Also ported to Windows through GNU. It is also written in C, therefore you don't have to reinvent the wheel. Of course, only use it if you can afford to in your "project"
ghostdog74
@Neil, i assumed he already have those huge data files generated by his programs.
ghostdog74
Yea, my project is my thesis. So far theres no code in there that i haven't personally written (I'm a mechanical engineer, not a computer scientist so i have limited knowledge on coding and i don't really want to use something i don't understand less my proffessor ask me about it).
Faken
A: 

It may be faster to do it this way:

ofstream output (fileName.c_str());
for (int j = 0; j < dim; j++)
{
    for (int i = 0; i < dim; i++)
        output << arrayPointer[j * dim + i] << '\t';
    output << '\n';
}
John Boker
+3  A: 

Use C IO, it's a lot faster than C++ IO. I've heard of people in programming contests timing out purely because they used C++ IO and not C IO.

#include <cstdio>

FILE* fout = fopen(fileName.c_str(), "w");

for (int j = 0; j < dim; j++) 
{ 
    for (int i = 0; i < dim; i++) 
        fprintf(fout, "%d\t, arrayPointer[j * dim + i]); 
    fprintf(fout, "\n");
} 
fclose(fout);

Just change %d to be the correct type.

JPvdMerwe
Thank you, this is significantly faster.
Faken
I'm actually curious as to how much, I've never tested it myself. Could you please quote some figures?
JPvdMerwe
my hard drive was writing at speeds of about 10MB/s This was for doubles though. With the previous method it was only about 500KB/s
Faken
The difference is scary actually... Though I wonder what the Duff Device type thing would do since it would definitely give you better batching.
JPvdMerwe