ansaurus

Question

Designing a perl script with multithreading and data sharing between threads

Answer 1

+1 A:

You can certainly do that in Perl, I suggest you look at perldoc threads and perldoc threads::shared, as these manual pages best describe the methods and pitfalls encountered when using threads in Perl.

What I would really suggest you use, provided you can, is instead a queue management system such as Gearman, which has various interfaces to it including a Perl module. This allows you to create as many "workers" as you want (the subs actually doing the work) and create one simple "client" which would schedule the appropriate tasks and then collate the results, without needing to use tricks as using hashref keys specific to the task or things like that.

This approach would also scale better, and you'd be able to have clients and workers (even managers) on different machines, should you choose so.

Other queue systems, such as TheSchwartz, would not be indicated as they lack the feedback/result that Gearman provides. To all effects, using Gearman this way is pretty much as the threaded system you described, just without the hassles and headaches that any system based on threads may eventually suffer from: having to lock variables, using semaphores, joining threads.

mfontani 2010-08-11 12:17:36

thank you. what I miss is how do you suggest to share the information between the threads?

David B 2010-08-11 12:31:37

David, check out http://search.cpan.org/~bradfitz/Gearman/lib/Gearman/Worker.pm

Octoberdan 2010-08-11 13:03:58

David, depending on the approach: with threads, use threads::shared's shared variables.With Gearman, rather than sharing variables you may want to pass critical data to the worker thread, as in "do operation X with this, this and this other piece of data". If you need different subs to handle the same piece of data, have them return the munged data.

mfontani 2010-08-11 14:16:18

Answer 2

+2 A:

Read threads::shared. By default, as perhaps you know, perl variables are not shared. But you place the shared attribute on them, and they are.

my %repository: shared;

Then if you want to synchronize access to them, the easiest way is to

{   lock( %repository );
    $repository{JSON_dump} = $json_dump;
}
# %respository will be unlocked at the end of scope.

However you could use Thread::Queue, which are supposed to be muss-free, and do this as well:

$repo_queue->enqueue( JSON_dump => $json_dump );

Then your consumer thread could just:

my ( $key, $value ) = $repo_queue->dequeue( 2 );
$repository{ $key } = $value;

Axeman 2010-08-11 15:22:48

+1 thanks Axeman. is it necessary to lock the entire repository when only a part of it (e.g. `repository->{key}` is changed?

David B 2010-08-11 15:43:54

@David B, yes, unfortunately, it is. Refer to http://search.cpan.org/perldoc?threads::shared#lock_VARIABLE

Axeman 2010-08-11 16:32:18

ansaurus

tags:

views:

answers:

Designing a perl script with multithreading and data sharing between threads

related questions