tags:

views:

158

answers:

3

I'm trying to find out what is the best way to read large text (at least 5 mb) files in C++, considering speed and efficiency. Any preferred class or function to use and why?

By the way, I'm running on specifically on UNIX environment.

A: 

The stream classes (ifstream) actually do a good job; assuming you're not restricted otherwise make sure to turn off sync_with_stdio (in ios_base::). You can use getline() to read directly into std::strings, though from a performance perspective using a fixed buffer as a char* (vector of chars or old-school char[]) may be faster (at a higher risk/complexity).

You can go the mmap route if you're willing to play games with page size calculations and the like. I'd probably build it out first using the stream classes and see if it's good enough.

Depending on what you're doing with each line of data, you might start finding your processing routines are the optimization point and not the I/O.

Joe
For ifstreams, what is its advantage over fread()?
jasonline
Performance-wise, I'd expect them to be roughly the same. In terms of code maintenance, I'd much rather deal with the stream classes.
Joe
A: 

Use old style file io.

fopen the file for binary read
fseek to the end of the file
ftell to find out how many bytes are in the file.
malloc a chunk of memory to hold all of the bytes + 1
set the extra byte at the end of the buffer to NUL.
fread the entire file into memory.
create a vector of const char *
push_back the address of the first byte into the vector.
repeatedly 
    strstr - search the memory block for the carriage control character(s).
    put a NUL at the found position
    move past the carriage control characters
    push_back that address into the vector
until all of the text in the buffer has been processed.

----------------
use the vector to find the strings,
and process as needed.
when done, delete the memory block
and the vector should self-destruct.
EvilTeach
How is it better than the stream classes?
jasonline
old style file io is is isomorphic to streams.you can do it either way.It's the slurping the entire file in at once, and parsing up the strings that is significant.
EvilTeach
A: 

If you are using text file storing integers, floats and small strings, my experience is that FILE, fopen, fscanf are already fast enough and also you can get the numbers directly. I think memory mapping is the fastest, but it requires you to write code to parse the file, which needs extra work.

Yin Zhu