views:

31

answers:

1

I am submitting about 234 jobs (but my example contains only 50 for demonstration purpose) to my 20 node cluster using ParallelPython. I was expecting that it would queue and execute them but it seems to "lose" jobs and I am not understand where things are going wrong. When the script finishes, I am not able to see 50 files i.e. info_1, info_2 .... info_50 but rather I am seeing some random behavior. Any suggestions?

def readChecklist():
    f = open('/home/username/twisted/pp-1.6.0/checklist', 'r')
    checklist = [line.strip() for line in f]
    return checklist

def processFile(num):
    bl = readChecklist()

    # pick a filename to write to
    outfile = "info_" + str(num)
    FILE = open(outfile, "a")

    for i in range(num):
        FILE.write(str(i)+"\n")
        FILE.flush()
    FILE.close()
    return num

ppservers=("*",)

job_server = pp.Server(ppservers=ppservers)

inputs = range(50)
jobs = [(input, job_server.submit(processFile,(input,), (readCheckList,), ("os","math","time","sys","subprocess",))) for input in inputs]
for input, job in jobs:
    print "Job: ", input, " is", job() 

job_server.print_stats()

Output:

Job:  0  is True
Job:  1  is True
Job:  2  is True
Job:  3  is True
Job:  4  is True
Job:  5  is True
Job:  6  is True
Job:  7  is True
Job:  8  is True
Job:  9  is True
Job:  10  is True
Job:  11  is True
Job:  12  is True
Job:  13  is True
Job:  14  is True
Job:  15  is True
Job:  16  is True
Job:  17  is True
Job:  18  is True
Job:  19  is True
Job:  20  is True
Job:  21  is True
Job:  22  is True
Job:  23  is True
Job:  24  is True
Job:  25  is True
Job:  26  is True
Job:  27  is True
Job:  28  is True
Job:  29  is True
Job:  30  is True
Job:  31  is True
Job:  32  is True
Job:  33  is True
Job:  34  is True
Job:  35  is True
Job:  36  is True
Job:  37  is True
Job:  38  is True
Job:  39  is True
Job:  40  is True
Job:  41  is True
Job:  42  is True
Job:  43  is True
Job:  44  is True
Job:  45  is True
Job:  46  is True
Job:  47  is True
Job:  48  is True
Job:  49  is True
Time elapsed:  0.592607975006 s
Job execution statistics:
 job count | % of all jobs | job time sum | time per job | job server
         3 |          6.00 |       0.3226 |     0.107546 | x.x.x.x:abcd
         3 |          6.00 |       0.2849 |     0.094970 | x.x.x.x:abcd
         2 |          4.00 |       0.2420 |     0.121004 | x.x.x.x:abcd
         3 |          6.00 |       0.3328 |     0.110927 | x.x.x.x:abcd
         2 |          4.00 |       0.2314 |     0.115687 | x.x.x.x:abcd
         2 |          4.00 |       0.2634 |     0.131683 | x.x.x.x:abcd
         3 |          6.00 |       0.2827 |     0.094223 | x.x.x.x:abcd
         2 |          4.00 |       0.2496 |     0.124812 | x.x.x.x:abcd
         1 |          2.00 |       0.1701 |     0.170140 | x.x.x.x:abcd
         3 |          6.00 |       0.3053 |     0.101758 | x.x.x.x:abcd
         1 |          2.00 |       0.1334 |     0.133415 | x.x.x.x:abcd
         3 |          6.00 |       0.2777 |     0.092561 | x.x.x.x:abcd
         1 |          2.00 |       0.1152 |     0.115169 | x.x.x.x:abcd
         1 |          2.00 |       0.1273 |     0.127294 | x.x.x.x:abcd
         3 |          6.00 |       0.3345 |     0.111503 | x.x.x.x:abcd
         1 |          2.00 |       0.1128 |     0.112782 | x.x.x.x:abcd
         2 |          4.00 |       0.2636 |     0.131819 | x.x.x.x:abcd
         8 |         16.00 |       0.4413 |     0.055163 | local
         1 |          2.00 |       0.1905 |     0.190510 | x.x.x.x:abcd
         3 |          6.00 |       0.2774 |     0.092473 | x.x.x.x:abcd
         2 |          4.00 |       0.2197 |     0.109835 | x.x.x.x:abcd
Time elapsed since server creation 0.592818021774

List of files created: (One per job)
0
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
3
4
5
6
7
8
9
A: 

Ok my mistake! Just in case anyone else faces this issue, make sure your directory paths are absolute whether you are reading from a file or writing into a file... 5 hours of debugging :( but I learnt my lesson :)

Legend
I __hate__ it when that happens.
aaronasterling
@AaronMcSmooth: True! No matter how many complicated things I learn, its always the basic stuff that comes back and bites me... :) But at least glad that this nightmare is over (at least I hope so)
Legend