views:

377

answers:

3

hi,

referring to my last question (http://stackoverflow.com/questions/876605/multiple-child-process), i am now trying to make an external sorting implementation using multiple child process.

...
fp = fopen(pathname, "r"); // open inputfile in r mode
fgets(trash, 10, fp); // ignore first line

for (i=0; i<numberOfProcess; ++i) {
 #ifdef DBG
     fprintf(stderr, "\nDBG: Calling fork()\n"); 
 #endif

    if ((pids[i] = fork()) < 0) {
        perror("fork error");
        exit(EXIT_FAILURE);

    } else if (pids[i] == 0) { // Child Code

        if (numbersToSort % numberOfProcess == 0) { // 16 % 4 = 0
         partialDataSize = numbersToSort / numberOfProcess;      

         for (j=0; j<partialDataSize; j++) { 
          fscanf(fp, "%d", &arrayPartialData[j]);
          qsort(arrayPartialData, partialDataSize, sizeof(int), (void *)comp_num);

          //printf("%d\n", arrayPartialData[j]);
          // TODO: qsort data until partialDataSize
         }

        } 
        printf("pid: %d child process %d outputs: ", getpid(), pids[i]);
     printArray(arrayPartialData, partialDataSize);
     //break;
     exit(0);
    }  
}   

/* Wait for children to exit. */

while (numberOfProcess > 0) {
 pid = wait(&status);
 --numberOfProcess;
}

fclose(fp);

but of course this code outputs the same sequence of sorted integers from inputfile because of fscanf.. for example if the beginning of input file includes 5 1 4, then it outputs:

(1st child) 1 4 5
(2nd child) 1 4 5

(with two child process).. because fscanf starts to read integers from the beginning of input stream.

my problem now is how can i continue to read the numbers starting from the point where the previous child process left? for example, if input file includes 5 1 4 8 5 10, then it can output:

(1st child) 1 4 5

(2nd child) 5 8 10

thanks in advance;)

A: 

If you're using fscanf, the only thing you can do is have each process read and discard numbers until it gets to those that it should work on. In your case discard i*partialdatasize numbers.

So e.g. 5 7 3 1 4 8 5 10 2 you might have 5 7 3

1 4 8

5 10 2

which would sort to give

3 5 7

1 4 8

2 5 10.

Then you have to work out how to merge the sorted results.

pjc50
+1  A: 

I'd use the lower level open() and read() rather than the streams equivalent as otherwise you'll have to worry about synchronizing the stdio buffers with the underlying file descriptor. Note you'll still have issues reading complete numbers, so you'll probably need some sync between the processes.

As an alternative I would suggest a single process to read the file and write a subset of the lines to subprocesses that do the sorting (using pipe()), which they then write to another process doing the merge.

pixelbeat
A: 

If you can store your integers as binary. You can have the first thread read it's block

fread(&arrayPartialData[j], sizeof(int), partialDataSize, fp);

Than the 2nd thread can skip the block which has already been read (Because you know the size of each block). Then you can begin reading from there, without needing to discard any data.

fseek(partialDataSize * threadNumber);

I also recommend you use threads, as forking is very expensive. threads tutorial

Ben Reeves