So my data sample is in the following format.
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 19856 19974
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 21455 21638
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 21727 21897
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 21980 22063
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 24670 24811
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 34741 34902
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 3649 3836
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 59253 59409
jgi|Xentr4|100173|gw1.779.90.1 scaffold_779 101746 101969
jgi|Xentr4|100173|gw1.779.90.1 scaffold_779 106436 107233
and what I am attempting to do is for each unique name in the first column, retrieve the min value for column 3, and the max value for column 4. So the final input will look the same, a tab-delimited file, except that it will have the 1st 2 columns for each unique name, then the 3rd and 4th columns be the min and max values mentioned above. I'm fairly novice at programming and attempted to do this using hashes but failed miserably. Am trying now with arrays/regular expressions as seen below.
open (IN, "POS2") || die "nope\n";
my $prev_qn = super;
my $prev_sn = ultra;
my $prev_start = non;
my $prev_end = nono;
while (<IN>) {
chomp;
push (@list, "$_");
}
close (IN);
foreach $v (@list) {
$info = $v;
($query_name, $scaf_num, $start, $end) = split(/\t/, $info);
unless ($info =~ m/^$prev_qn/) {
push @ready, $info;
$prev_qn = $query_name;
$prev_sn = $scaf_num;
$prev_start = $start;
$prev_end = $end;
}
else {
if ($start < $prev_start) {
splice(@ready,2,1,$start);
}
if ($end > $prev_end) {
splice(@ready,3,1,$end);
}
$prev_qn = $query_name;
$prev_sn = $scaf_num;
$prev_start = $start;
$prev_end = $end;
}
foreach $z (@ready) {
print "$z\n";
}
}
the output this returns is below.
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
19974
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
19974
21638
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
19974
21638
21897
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
19974
21638
22063
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
19974
21638
24811
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
19974
21638
34902
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
19974
3649
34902
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
19974
3649
59409
jgi|Xentr4|100164|gw1.1441.2.1 scaffold_1441 18150 18354
19974
3649
101969
So it seems clear that the file is doing the comparison fine, but it is not replacing the elements in the array as expected, simply appending them beneath and replacing those. Additionally it never prints past the first unique name. Any suggestions?