views:

266

answers:

5

I have data that looks like this:

    my @homopol = (
                   ["T","C","CC","G"],  # part1
                   ["T","TT","C","G","A"], #part2
                   ["C","CCC","G"], #part3 ...upto part K=~50
                  );


    my @prob = ([1.00,0.63,0.002,1.00,0.83],
                [0.72,0.03,1.00, 0.85,1.00],
                [1.00,0.97,0.02]);


   # Note also that the dimension of @homopol is always exactly the same with @prob.
   # Although number of elements can differ from 'part' to 'part'.

What I want to do is to

  1. Generate all combinations of elements in part1 through out partK
  2. Find the product of the corresponding elements in @prob.

Hence at the end we hope to get this output:

T-T-C  1 x 0.72 x 1 = 0.720
T-T-CCC     1 x 0.72 x 0.97 = 0.698
T-T-G  1 x 0.72 x 0.02 = 0.014
...
G-G-G  1 x 0.85 x 0.02 = 0.017
G-A-C  1 x 1 x 1 = 1.000
G-A-CCC     1 x 1 x 0.97 = 0.970
G-A-G  1 x 1 x 0.02 = 0.020

The problem is that the following code of mine does that by hardcoding the loops. Since the number of parts of @homopol is can be varied and large (e.g. ~K=50), we need a flexible and compact way to get the same result. Is there any? I was thinking to use Algorithm::Loops, but not sure how to achieve that.

use strict;
use Data::Dumper;
use Carp;


my @homopol = (["T","C","CC","G"],
               ["T","TT","C","G","A"],
               ["C","CCC","G"]);


my @prob = ([1.00,0.63,0.002,1.00,0.83],
            [0.72,0.03,1.00, 0.85,1.00],
            [1.00,0.97,0.02]);



my $i_of_part1 = -1;
foreach my $base_part1 ( @{ $homopol[0] } ) {
    $i_of_part1++;
    my $probpart1 = $prob[0]->[$i_of_part1];

    my $i_of_part2 =-1;
    foreach my $base_part2 ( @{ $homopol[1] } ) {
        $i_of_part2++;
        my $probpart2 = $prob[1]->[$i_of_part2];

        my $i_of_part3 = -1;
        foreach my $base_part3 ( @{ $homopol[2] } ) {
            $i_of_part3++;
            my $probpart3 = $prob[2]->[$i_of_part3];

            my $nstr = $base_part1."".$base_part2."".$base_part3;
            my $prob_prod = sprintf("%.3f",$probpart1 * $probpart2 *$probpart3);

            print "$base_part1-$base_part2-$base_part3 \t";
            print "$probpart1 x $probpart2 x $probpart3 = $prob_prod\n";

        }
    }
}
A: 

Why don't you use recursion? Pass the depth as a parameter and let the function call itself with depth+1 inside the loop.

nikie
Why *would* you use recursion when you don't need to? Perl doesn't have tail recursion, so the main reason it works out in other languages often kills you in Perl.
brian d foy
Recursing to a depth of 50 is perfectly acceptable in any language. Don't complicate your code needlessly to avoid recursion.
wrang-wrang
Complicate your code? It's more complicated to use recursion. :) And, you don't know it's only going to be 50 deep. Programs tend to stretch their limits as people use them for new situations. When it's so easy to avoid risk, why take the risk? :)
brian d foy
A: 

you could do it by creating an array of indicies the same length as the @homopol array (N say), to keep track of which combination you are looking at. In fact this array is just like a number in base N, with the elements being the digits. Iterate in the same way as you would write down consectutive numbers in base N, e.g (0 0 0 ... 0), (0 0 0 ... 1), ...,(0 0 0 ... N-1), (0 0 0 ... 1 0), ....

Chris Card
A: 

Approach 1: Calculation from indices

Compute the product of lengths in homopol (length1 * length2 * ... * lengthN). Then, iterate i from zero to the product. Now, the indices you want are i % length1, (i / length1)%length2, (i / length1 / length2) % length3, ...

Approach 2: Recursion

I got beaten to it, see nikie's answer. :-)

Igor ostrovsky
+1  A: 

A solution using Algorithm::Loops without changing the input data would look something like:

use Algorithm::Loops;

# Turns ([a, b, c], [d, e], ...) into ([0, 1, 2], [0, 1], ...)
my @lists_of_indices = map { [ 0 .. @$_ ] } @homopol;

NestedLoops( [ @lists_of_indices ], sub {
  my @indices = @_;
  my $prob_prod = 1; # Multiplicative identity
  my @base_string;
  my @prob_string;
  for my $n (0 .. $#indices) {
    push @base_string, $hompol[$n][ $indices[$n] ];
    push @prob_string, sprintf("%.3f", $prob[$n][ $indices[$n] ]);
    $prob_prod *= $prob[$n][ $indices[$n] ];
  }
  print join "-", @base_string; print "\t";
  print join "x", @prob_string; print " = ";
  printf "%.3f\n", $prob_prod;
});

But I think that you could actually make the code clearer by changing the structure to one more like

[ 
  { T => 1.00, C => 0.63, CC => 0.002, G => 0.83 },
  { T => 0.72, TT => 0.03, ... },
  ...
]

because without the parallel data structures you can simply iterate over the available base sequences, instead of iterating over indices and then looking up those indices in two different places.

hobbs
@hobbs: Your approach also create the unnecessary pairwise and single combination, (e.g. T-T,T-TT, T--) . Is there any way we can modify that?
neversaint
`NestedLoops` accepts an optional filter sub that lets you control what combinations your code will get invoked on. But by default it should do the same thing as the original code in the question, so I'm not sure what it should be doing.
hobbs
+4  A: 

I would recommend Set::CrossProduct, which will create an iterator to yield the cross product of all of your sets. Because it uses an iterator, it does not need to generate every combination in advance; rather, it yields each one on demand.

use strict;
use warnings;
use Set::CrossProduct;

my @homopol = (
    [qw(T C CC G)],
    [qw(T TT C G A)],
    [qw(C CCC G)], 
);

my @prob = (
    [1.00,0.63,0.002,1.00],
    [0.72,0.03,1.00, 0.85,1.00],
    [1.00,0.97,0.02],
);

# Prepare by storing the data in a list of lists of pairs.
my @combined;
for my $i (0 .. $#homopol){
    push @combined, [];
    push @{$combined[-1]}, [$homopol[$i][$_], $prob[$i][$_]]
        for 0 .. @{$homopol[$i]} - 1;
};

my $iterator = Set::CrossProduct->new([ @combined ]);
while( my $tuple = $iterator->get ){
    my @h = map { $_->[0] } @$tuple;
    my @p = map { $_->[1] } @$tuple;
    my $product = 1;
    $product *= $_ for @p;
    print join('-', @h), ' ', join(' x ', @p), ' = ', $product, "\n";
}
FM
Using combinations() means you create all the tuples. You can use `while( my $tuple = $iterator->next )` to avoid putting all those in memory.
brian d foy
@brian Thanks, I fixed it.
FM
neversaint
@foolishbrat Sorry, fixed again. Should have run the code before editing. The method we need is `get` rather than `next`.
FM
Oh, that's my fault. Sorry. next() peeks ahead but doesn't get the next tuple.
brian d foy