views:

112

answers:

3

I wrote a program that calls a function with the following prototype:

def Process(n):

    # the function uses data that is stored as binary files on the hard drive and 
    # -- based on the value of 'n' -- scans it using functions from numpy & cython.    
    # the function creates new binary files and saves the results of the scan in them.
    #
    # I optimized the running time of the function as much as I could using numpy &  
    # cython, and at present it takes about 4hrs to complete one function run on 
    # a typical winXP desktop (three years old machine, 2GB memory etc).

My goal is to run this function exactly 10,000 times (for 10,000 different values of 'n') in the fastest & most economical way. following these runs, I will have 10,000 different binary files with the results of all the individual scans. note that every function 'run' is independent (meaning, there is no dependency whatsoever between the individual runs).

So the question is this. having only one PC at home, it is obvious that it will take me around 4.5 years (10,000 runs x 4hrs per run = 40,000 hrs ~= 4.5 years) to complete all runs at home. yet, I would like to have all the runs completed within a week or two.

I know the solution would involve accessing many computing resources at once. what is the best (fastest / most affordable, as my budget is limited) way to do so? must I buy a strong server (how much would it cost?) or can I have this run online? in such a case, is my propritary code gets exposed, by doing so?

in case it helps, every instance of 'Process()' only needs about 500MB of memory. thanks.

+1  A: 

Does Process access the data on the binary files directly or do you cache it in memory? Reducing the usage of I/O operations should help.

Also, isn't it possible to break Process into separate functions running in parallel? How is the data dependency inside the function?

Finally, you could give some cloud computing service like Amazon EC2 a try (don't forget to read this for tools), but it won't be cheap (EC2 starts at $0.085 per hour) - an alternative would be going to an university with a computer cluster (they are pretty common nowadays, but it will be easier if you know someone there).

bnery
bnery: the binary files are not cached in memory, as they are too big and cannot fit in memory. I read the files using numpy's mmap(), which is very, very fast.
Shalom Rav
+3  A: 

Check out PiCloud: http://www.picloud.com/

import cloud
cloud.call(function)

Maybe it's an easy solution.

niscy
very interesting. thank you for bringing this to my attention.
Shalom Rav
+1 for the link to PiCloud...quite interesting :)
elo80ka
+1  A: 

Well, from your description, it sounds like things are IO bound... In which case parallelism (at least on one IO device) isn't going to help much.

Edit: I just realized that you were referring more to full cloud computing, rather than running multiple processes on one machine... My advice below still holds, though.... PyTables is quite nice for out-of-core calculations!

You mentioned that you're using numpy's mmap to access the data. Therefore, your execution time is likely to depend heavily on how your data is structured on the disc.

Memmapping can actually be quite slow in any situation where the physical hardware has to spend most of its time seeking (e.g. reading a slice along a plane of constant Z in a C-ordered 3D array). One way of mitigating this is to change the way your data is ordered to reduce the number of seeks required to access the parts you are most likely to need.

Another option that may help is compressing the data. If your process is extremely IO bound, you can actually get significant speedups by compressing the data on disk (and sometimes even in memory) and decompressing it on-the-fly before doing your calculation.

The good news is that there's a very flexible, numpy-oriented library that's already been put together to help you with both of these. Have a look at pytables.

I would be very surprised if tables.Expr doesn't significantly (~ 1 order of magnitude) outperform your out-of-core calculation using a memmapped array. See here for a nice, (though canned) example. From that example:

PyTables vs Numpy Memmap

Joe Kington