ansaurus

Question

Answer 1

+2 A:

With the little information you've given: Parallel::ForkManager sounds like an appropriate tool. But you're likely to get better help if you give more detail about your problem.

Parallelizing is always a difficult problem. How much you can hope to gain depends a lot on the nature of the task. For example, are you looking for a specific line in the file? Or a specific fixed-size record? Or all the chunks that match a particular bit pattern? Do you process the file from beginning to end, or can you skip some parts, or do you do a lot of shuffling back and forth? etc.

Also is the 8GB file an absolute constraint, or might you be able to reorganize the data to make the information easier to find?

With the speeds you're giving, if you're just going through the file once, I/O is not the bottleneck, but it's close. It could be the bottleneck if other processes are accessing the disk at the same time. It may be worth fine-tuning your disk access patterns; this would be somewhat OS- and filesystem-dependent.

Gilles 2010-07-17 10:04:43

Answer 2

+5 A:

A basic line-by-line parsing pass through a 1GB file -- for example, running a regex or something -- takes just a couple of minutes on my 5-year-old Windows box. Even if the parsing work is more extensive, 4 hours sounds like an awfully long time for 8GB of data.

Are you sure that your code does not have a glaring inefficiency? Are you storing a lot of information during the parsing and bumping up against your RAM limits? CPAN has tools that will allow you to profile your code, notably Devel::NYTProf.

Before going through the hassle of parallelizing your code, make sure that you understand where the bottleneck is. If you explain what you are doing or, even better, provide code that illustrates the problem in a compact way, you might get better answers.

FM 2010-07-17 13:19:24

ansaurus

tags:

views:

answers:

Parallel computing in Perl

related questions