views:

1395

answers:

5

Hi. I was googling for some advise about this and I found some links. The most obvious was this one but in the end what im wondering is how well my code is implemented.

I have basically two classes. One is the Converter and the other is ConverterThread

I create an instance of this Converter class that has a property ThreadNumber that tells me how many threads should be run at the same time (this is read from user) since this application will be used on multi-cpu systems (physically, like 8 cpu) so it is suppossed that this will speed up the import

The Converter instance reads a file that can range from 100mb to 800mb and each line of this file is a tab-delimitted value record that is imported to another destination like a database.

The ConverterThread class simply runs inside the thread (new Thread(ConverterThread.StartThread)) and has event notification so when its work is done it can notify the Converter class and then I can sum up the progress for all these threads and notify the user (in the GUI for example) about how many of these records have been imported and how many bytes have been read.

It seems, however that I'm having some trouble because I get random errors about the file not being able to be read or that the sum of the progress (percentage) went above 100% which is not possible and I think that happens because threads are not being well managed and probably the information returned by the event is malformed (since it "travels" from one thread to another)

Do you have any advise on better practices of implementation of threads so I can accomplish this?

Thanks in advance.

+7  A: 

I read very large files in some of my own code and, I have to tell you, I am skeptical of any claim that adding threads to a read operation would actually improve the overall read performance. In fact, adding threads might actually reduce performance by causing head seeks. It is highly likely that any file operations of this type would be I/O bound, not CPU bound.

Given that the author of the post you referenced never actually provided the 'real' code, his claims that multiple threads will speed up I/O remain untestable by others. Any attempt to improve hard disk read/write performance by adding threads would most certainly be I/O bound, unless he is doing some serious number crunching between reads, or has stumbled upon some happy coincidence having to do with the disk cache, in which case the performance improvement might be unreproduceable on another machine with different hardware characteristics.

Generally, when files of this size are involved, an additional 20% or 30% improvement in performance is not going to matter much, even if it is possible utilizing threads, because such a task would most certainly be considered a background task (not real-time). I use multiple threads for this kind of work, not because it improves read performance on one file, but because multiple files can be processed simultaneously in the background.

Before using threads to do this, I carefully benchmarked the software to see if threads would actually improve overall throughput. The results of the tests (on my development machine) were that using the same number of threads as the number of processor cores produced the maximum possible throughput. But that was processing ONE file per thread.

Robert Harvey
+1, Threads are not the answer here.
Henk Holterman
+8  A: 

Multiple threads reading a file at a time is asking for trouble. I would set up a producer consumer model such that the producer read the lines in the file, perhaps into a buffer, and then handed them out to the consumer threads when they complete processing their current work load. It does mean you have a blocking point where the lines are handed out but if processing takes much longer than reading then it shouldn't be that big of a deal. If reading is the slow part then you really don't need multiple consumers anyway.

stimms
Very well said, particularly the last part.
Josh Einstein
Actually the processing of the data is what takes the most. Indeed, what I'm doing right now is that the main thread reads the file line by line and as each line is consumed a new thread is created passing that line to the thread so it can process that information. As soon as a thread is done i fire up an event that tells me that thread has finished so I can create a new one so I dont create more than those the user pointed (num of threads is configurable)
Gustavo Rubio
A: 

You should try to just have one thread read the file, since multiple threads will likely be bound by the I/O anyway. Then you can feed the lines into a thread-safe queue from which multiple threads can dequeue lines to parse.

You won't be able to tell the progress of any one thread because that thread has no defined amount of work. However, you should be able to track approximate progress by keeping track of how many items (total) have been added to the queue and how many have been taken out. Obviously as your file reader thread puts more lines into the queue your progress will appear to decrease because more lines are available, but presumably you should be able to fill the queue faster than workers can process the lines.

Josh Einstein
A: 

Hello, I am really amazed reading this post. I tried to increase the performance using threading to read a file from different location but no luck. I don't know whether it increase performance or not but i coded with thread pool to read a file and add the records into a list and i think the process is faster. Still it is forbidden to use thread pool while interacting with UI, but i just allow the default threading to read the file. I have the code. If anyone need that please let me know.

Thanks.

Shaoun
A: 

I am developing the web site, It consists of device name list and related Build Button. When one click the Build button the one process will run in server. When more than ten user click the Build button more processes will create at that server will hang. How can send all request from client to single process in server.

Shashi
You need to create a new question so others can see your question and help you. (click on the Ask question button in the top right corner)
Rox