views:

297

answers:

6

I am writing a hex editor program and I was thinking about when a user tries to open a very large file (3GB+). I wouldn't want the user to sit around all day for the whole file to load when it already has some data loaded.

So here is my question, would it be possible to have multiple threads read the file (not write) simultaneously, at different places, and then then once a certain threshold of data has been read by 1, that thread displays its data while the others continue to read? Would that offer me a performance improvement? Or would memory bandwidths reduce any speed gain I could get from using multiple threads?

+3  A: 

You probably don't want to use multiple threads. Even on a multi-core CPU, there is still only one path to the disk, so you probably won't get a performance gain (disk access is much slower than memory).

You have a good idea though with loading and displaying little bits at a time. Just do this in one thread. Read roughly the first megabyte, display it, and do the next in the background, etc.

And you are right that you might want a separate thread for the GUI. This is one of the reasons why BeOS was so incredibly responsive compared to other OS's of the time. It used many different threads for different tasks.

Just don't expect multiple threads reading from disk to help.

Also, you can use aio_read() to do asynchronous IO on Linux. If you're using Windows, just try googling "windows asynchronous io" (I'm not really sure how you do it; I don't use Windows).

Zifre
So maybe have the I/O be a separate thread from the GUI then?Then if the GUI tried to view data that wasn't yet retrieved, I could interrupt the I/O thread and get the requested data.
samoz
@Samoz - That's exactly the right approach - do not block the UI thread on IO.
Michael
+3  A: 

I'm not sure what perf boost you are expecting . . . there is one stream of data coming off of the disk, and having multiple threads read from disk will just increase contention and possibly create a slow-down as the disk head bounces back and forth due to competing requests.

You should look into doing asynchronous IO instead and processing data as soon as it comes in to keep your application appearing responsive.

Michael
I was thinking that the bandwidth would hamper it, but still figured I'd ask.
samoz
+1 for asynchronous IO. I can't believe I didn't think of that. It the definitely the most sane way...
Zifre
What do you mean exactly by asynchronous I/O exactly?
samoz
@samoz: it means you tell the OS to start doing the IO, and you can do other things, and then actually use the data whenever you want once it has loaded.
Zifre
A: 

I think you'd be better off using asynchronous aka non-blocking I/O. That means you can send off a read request, then continue processing, and later on go to pick up the results of the request. Thus, a single thread can overlap processing and I/O. A bit of googling will find API docs for your platform.

Tom Anderson
+1  A: 

forget about reading the whole file. just read small blocks when the user needs it. it's even easier on an hex editor, since the content doesn't affect the layout.

reading a screenfull of data is done in milliseconds, the user won't realize its done when moving around instead of reading the whole data in advance

Javier
+4  A: 

For a hex-editor, there is no nead to read the whole file into memory. The user can only view or modify the data but without inserting or deleting.

You can simply use memory mapped files. The data will be automatically read when accessed and only the chunk displayed will be read. This provide fast scrolling and jumping to any location in the file.

bill
+1  A: 

As @bill said, you'll want to use memory-mapped files. I think you'll find the following tutorials very valuable:

The above tutorials should give you all of the information that you need.

Dustin Campbell