Hi there,

I want to read a file as fast as possible (40k lines) [Edit : the rest is obsolete].

Edit: Andres Jaan Tack suggested a solution based on one thread per file, and I want to be sure I got this (thus this is the fastest way) :

  • One thread per entry file reads it whole and stocks its content in a container associated (-> as many containers as there are entry files)
  • One thread calculates the linear combination of every cell read by the input threads, and stocks the results in the exit container (associated to the output file).
  • One thread writes by block (every 4kB of data, so about 10 lines) the content of the output container.

Should I deduce that I must not use m-mapped files (because the program's on standby waiting for the data) ?

Thanks aforehand.


Mister mystère.

+16  A: 

[Edit: original question asked if launching up to 40,000 threads would speed up file read]

What you suggest would most likely slow down the access due to the overhead of creating threads and context switching. More threads only help if you are

1) computationally bound and you have extra cores that could help with the work

2) blocking and other threads could work while waiting for others to unblock

3) you have a very clever algorithm that makes use of cache behavior

Most likely your speed is bound by disk and/or memory bandwidth not computational limits so a single execution thread would be able to max those out.

+4  A: 

Yes, it's a waste of time. At very best you'll end up with about the same performance. At worst, it might hurt performance from the disk seeking to different parts of the file instead of reading through it consecutively.

Jerry Coffin
Well, at the very best he's on a raid with 40,000 discs that can be accessed in parallel in which case it might save him time. :) But that's just being devil's advocate, I agree with your assessment of the practical situation.
He would also need 40,000 processor cores to process each thread's incoming stream.
John Dibling
+4  A: 

In contrast to other readers I believe that theoretically there can be some benifit, even if you're running on an SP (single-processor) system. However I'd never do this for as much as 40K lines (assuming you talk about normal-sized lines).

They key is Amardeep's answer, where he/she says that creating threads is useful when a thread becomes blocked for some reason.

Now, how do mapped files "work"? When you access a memory page in that region for the first time - the processor generates a page fault. The OS loads the contents of the file (this involves disk access) into the memory page. Then the execution returns to your thread.

I also believe upon page fault the OS fills a bunch of consecutive pages, not just single one.

Now, what's important is that during the page fault processing your thread is suspended. Also during this period the CPU isn't loaded (apart from what other processes may do).

So that if you look at the time scale you see a period of two sections: one where CPU is loaded (here you read the contents of the page and do some processing), and one where CPU is nearly idle and the I/O on the disk is performed.

On the other hand you may create several threads, each one is assigned to read a different portion of the file. You benefit from two effects:

  1. Other thread has a chance to load the CPU (or several CPUs if MP system) when one is blocked by I/O.

  2. Even in case where the processing is very short (hence the CPU is not the bottleneck) - still there's a benefit. It's related to the fact that if you issue several I/O on the same physical device - it has a chance to perform them more efficiently.

For instance, when reading many different sectors from the HD drive you can actually read them all within one disk rotation.


And, of course, I'd never thought to do this for 40K lines. The overhead of creating threads, waiting for them to finish, context switches, logic complification, error/failure handling, and etc.

I'd try to do this for a file of at least tens of MBs.


Wow, I never saw such reactivity and quality, both at the same time. Thanks all of you, especially valdo who made me understand mapped files.

In fact, I want to make a linear combination of several CSV files, read line by line and processed cell by cell. Apparently, it's not interesting to use multithreading at reading, but do you think : - Reading and storing first in arrays, then process them using multithreading before writing the exit file is faster than : - Reading, processing, and writing line by line


The client machine is octo-cored without RAID.

Seem my comment below.
John Dibling
Please edit your original question to clarify rather than entering a new answer, because your answer may be right at the bottom of the page.
I can't, now that I got my account back to working... But I'm really close to a perfect answer.
Mister Mystère
+1  A: 

This is a problem of granularity. You've got a small file, and very little processing to do. One thread can probably gobble the entire file in one time slice and process it in the next. Two threads would be worse than one. You need a much larger task before considering parallelism as a performance solution.

+8  A: 

Your question got a little bit deeper, when you asked further. I'll try to cover all your options...

Reading One File: How many threads?

Use one thread.

If you read straight through a file front-to-back from a single thread, the operating system will not fetch the file in small chunks like you're thinking. Rather, it will prefetch the file ahead of you in huge (exponentially growing) chunks, so you almost never pay a penalty for going to disk. You might wait for the disk a handful of times, but in general it will be like the file was already in memory, and this is even irrespective of mmap.

The OS is very good at this kind of sequential file reading, because it's predictable. When you read a file from multiple threads, you're essentially reading randomly, which is (obviously) less predictable. Prefetchers tend to be much less effective with random reads, in this case probably making the whole application slower instead of faster.

Notice: This is even before you add the cost of setting up the threads and all the rest of it. That costs something, too, but it's basically nothing compared with the cost of more blocking disk accesses.

Reading Multiple Files: How many threads?

Use as many threads as you have files (or some reasonable number).

File prefetching done separately for each open file. Once you start reading multiple files, you should read from several of them in parallel. This works because the disk I/O Scheduler will try to figure out the fastest order in which to read all of them in. Often, there's a disk scheduler both in the OS and on the hard drive itself. Meanwhile, the prefetcher can still do its job.

Reading several files in parallel is always better than reading the files one-by-one. If you did read them one at a time, your disk would idle between prefetches; that's valuable time to read more data into memory! The only way you can go wrong is if you have too little RAM to support many open files; that's not common, anymore.

A word of caution: If you're too overzealous with your multiple file reads, reading one file will start kicking bits of other files out of memory, and you're back to a random-read situation.

Combining n Files into One.

Processing and producing output from multiple threads might work, but it depends how you need to combine them. You'll have to be careful about how you synchronize the threads, in any case, though there are surely some relatively easy lock-less ways to do that.

One thing to look for, though: Don't bother writing the file in small (< 4K) blocks. Collect at least 4K of data at a time before you call write(). Also, since the kernel will lock the file when you write it, don't call write() from all of your threads together; they'll all wait for each other instead of processing more data.

Andres Jaan Tack
This is a wonderful answer. So, let me recap' and stop me if I'm wrong :- One thread per entry file reads it whole and stocks its content in a container associated (-> as many containers as there are entry files)- One thread calculates the linear combination of every cell read by the input threads, and stocks the results in the exit container (associated to the output file).- One thread writes by block (every 4kB of data, so about 10 lines) the content of the output container.Should I deduce that I must not use m-mapped files (because the program's on standby waiting for the data) ?
Mister Mystère
Reading files in parallel is not efficent - there is only one bus between RAM and the hard disk. If each read required a seek, then you're adding overhead to the read operation.
@Skizz You're missing the advantage of the disk scheduler, which will reorder reads to be more efficient than you're imagining. Plus, again, you are getting work done between each thread's own reads rather than waiting for processing to finish.
Andres Jaan Tack
@Mister Mystère: I like that strategy, yeah. The writer-thread, especially, should work very well.Memory-mapped files are really only meant for when you're storing raw memory (e.g. structs with pointers) in a file. The whole point is to avoid explicitly writing tiny changes. Based on what I'm reading, you don't really need that.
Andres Jaan Tack
Alright then, this was intel phase, development phase starts in july, I'll feed. Thanks for your answer, I really appreciate. Hope I'll be able to rise my reputation up by answering, instead of just questioning right.
Mister Mystère
By the way: I was talking specifically about _reading_ a file, here. The processing of the data in the file, after it's read from disk, is a different problem. There it depends what you need to do to the data, whether you use threads or not.
Andres Jaan Tack

I'm thinking like this.

You have 8 cores, so make 8 threads. Let each thread parse one block of the file. So you need to get the device/disk block size. When a block has been parsed by a thread let the thread parse a new one not yet "assigned" to a thread.

Another idea I have would to have 2 threads. A parsing thread and a thread just stepping over the file's disk blocks, ie just by reading the first byte of each block thus forcing the file to be read into memory as fast as possible.

But, this could be made into a contest. Nothing beats doing real live runs! and people will show you! :) find a suitable price!