views:

38

answers:

1

What is the most efficient way to look up values in a BDB for several files in parallel? If I had a Perl script which did this for one file at a time, would forking/running the process in background with the ampersand in Linux work?

How might Hadoop be used to solve this problem?

Would threading be another solution?

A: 

Hadoop is totally irrelevant to this case. Hadoop is a system for parallelizing large computational tasks on computer clusters, not for parallelizing short-lived lookups on a single node.

If I understand correctly, you want Perl to look up a value in several BDB files in parallel. This is best done by giving your bdb calls a callback handle that will get executed when the request finishes. The threading will be done at the C layer, much more efficient than doing it manually in Perl.

Building blocks:

BDB: http://search.cpan.org/~mlehmann/BDB-1.84/BDB.pm

Coro::BDB: http://search.cpan.org/~mlehmann/Coro-5.17/Coro/BDB.pm

AnyEvent: http://search.cpan.org/~mlehmann/AnyEvent-5.2/lib/AnyEvent.pm

SquareCog