tags:

views:

1980

answers:

11

My platform is windows vista 32, with visual c++ express 2008 .

for example:

if i have a file contains 4000 bytes, can i have 4 threads read from the file at same time? and each thread access a different section of the file.

thread 1 read 0-999, thread 2 read 1000 - 2999, etc.

please give a example in C language.

+2  A: 

You can certainly have multiple threads reading from a data structure, race conditions can potentially occur if any writing is taking place.

To avoid such race conditions you need to define the boundaries that threads can read, if you have an explicit number of data segments and an explicit number of threads to match these then that is easy.

As for an example in C you would need to provide some more information, like the threading library you are using. Attempt it first, then we can help you fix any issues.

Stinomus
i have not write the program yet , but i will try pthread, since i did some pthread job in Linux. the program will not write any thing, just need to read from the file.
anru
A: 

You need a way to sync those threads. There're different solutions to mutex http://en.wikipedia.org/wiki/Mutual_exclusion

ktulur
if i sync those threads, then it is not reading the file at same time, it is becomes a sequence reading, right?
anru
Right if that file isn't going to be written by other thread/socket. If in your case you just want to read but from different parts, Why not parsing the whole file once first into the 4 vars you need?
ktulur
A: 

He wants to read from a file in different threads. I guess that should be ok if the file is opened as read-only by each thread.

I hope you don't want to do this for performance though, since you will have to scan large parts of the file for newline characters in each thread.

Jonatan
why need to search for new line char?
anru
you have to know at which offset in the file line 1000, 2000, 3000, and so on, begins.
Jonatan
he is accessing bytes, not lines - so no need to care about that.
Francis
+8  A: 

If you don't write to them, no need to take care of sync / race condition.

Just open the file with shared reading as different handles and everything would work. (i.e., you must open the file in the thread's context instead of sharing same file handle).

#include <stdio.h>
#include <windows.h>

DWORD WINAPI mythread(LPVOID param)
{
    int i = (int) param;
    BYTE buf[1000];
    DWORD numread;

    HANDLE h = CreateFile("c:\\test.txt", GENERIC_READ, FILE_SHARE_READ,
        NULL, OPEN_EXISTING, 0, NULL);

    SetFilePointer(h, i * 1000, NULL, FILE_BEGIN);
    ReadFile(h, buf, sizeof(buf), &numread, NULL); 
    printf("buf[%d]: %02X %02X %02X\n", i+1, buf[0], buf[1], buf[2]);

    return 0;
}

int main()
{
    int i;
    HANDLE h[4];

    for (i = 0; i < 4; i++)
        h[i] = CreateThread(NULL, 0, mythread, (LPVOID)i, 0, NULL);

    // for (i = 0; i < 4; i++) WaitForSingleObject(h[i], INFINITE);
    WaitForMultipleObjects(4, h, TRUE, INFINITE);

    return 0;
}
Francis
The loop with WaitForSingleObject() should be replaced with a single WaitForMultipleObjects() call. Other than that it's +1.
mghie
A: 

You shouldn't need to do anything particularly clever if all they're doing is reading. Obviously you can read it as many times in parallel as you like, as long as you don't exclusively lock it. Writing is clearly another matter of course...

I do have to wonder why you'd want to though - it will likely perform badly since your HDD will waste a lot of time seeking back and forth rather than reading it all in one (relatively) uninterrupted sweep. For small files (like your 4000 line example) where that might not be such a problem, it doesn't seem worth the trouble.

Peter
Depending on the type of drive, you could get better performance - for example with a (good) solid state drive they will give good multi-threaded performance.
1800 INFORMATION
+3  A: 

I don't see any real advantage to doing this.
You may have multiple threads reading from the device but your bottleneck will not be CPU but rather disk IO speed.

If you are not careful you may even slow the processes down (but you will need to measure it to know for certain).

Martin York
If he has a good raid or a SSD, this might not be so bad, but good point. +1
Ben Collins
A: 

It is possible though i'm not sure it will be worth the effort. Have you considered reading the entire file into memory within a single thread and then allow multiple threads to access that data?

jon hanson
A: 

Reading: No need to lock the file. Just open the file as read only or shared read

Writing: Use a mutex to ensure the file is only written to by one person.

Xetius
A: 

As others have noted already, there is no inherent problem in having multiple threads read from the same file, as long as they have their own file descriptor/handles. However, I'm a little curious about your motives. Why do you want to read a file in parallell? If you're only reading a file into memory, your bottleneck is likely the disk itself, in which case multiple thread won't help you at all (it'll just clutter your code).

And as always when optimizing, you should not attempt it until you (1) have a easy to understand, working, solution, and (2) you've measured your code to know where you should optimize.

JesperE
+3  A: 

There's not even a big problem writing to the same file, in all honesty.

By far the easiest way is to just memory-map the file. The OS will then give you a void* where the file is mapped into memory. Cast that to a char[], and make sure that each thread uses non-overlapping subarrays.

void foo(char* begin, char*end) { /* .... */ }
void* base_address = myOS_memory_map("example.binary");
myOS_start_thread(&foo, (char*)base_address, (char*)base_address + 1000);
myOS_start_thread(&foo, (char*)base_address+1000, (char*)base_address + 2000);
myOS_start_thread(&foo, (char*)base_address+2000, (char*)base_address + 3000);
MSalters
A: 

Windows supports overlapped I/O, which allows a single thread to asynchronously queue multiple I/O requests for better performance. This could conceivably be used by multiple threads simultaneously as long as the file you are accessing supports seeking (i.e. this is not a pipe).

Passing FILE_FLAG_OVERLAPPED to CreateFile() allows simultaneous reads and writes on the same file handle; otherwise, Windows serializes them. Specify the file offset using the Offset and OffsetHigh members of the OVERLAPPED structure.

For more information see Synchronization and Overlapped Input and Output.

bk1e