tags:

views:

186

answers:

4

Hi Im supposed to compare lines in a file :

KB0005  1019 T IFVATVPVI 0.691 PKC YES
KB0005  1036 T YFLQTSQQL 0.785 PKC YES
KB0005  1037 S FLQTSQQLK 0.585 DNAPK YES
KB0005  1045 S KQLESEGRS 0.669 PKC YES
KB0005  1045 S KQLESEGRS 0.880 unsp YES
KB204320    1019 T IFVATVPVI 0.699 PKC YES
KB204320    1036 T YFLQTSQQL 0.789 PKC YES
KB204320    1037 S FLQTSQQLK 0.589 DNAPK YES
KB204320    1045 S KQLESEGRS 0.880 unsp YES

and print the lines that differs or dont repeat, which i managed to do by first putting lines into 2 arrays (the lines differ in names KB0005 and KB204320) and then by writing a perl script: Code:

foreach $item (@a1, @a2) { $count{$item}++;}

foreach $item (keys %count) {
    if ($count{$item} == 2) {
        next;
    } else {
        push @diff, $item;
    }
}

my @sorted =sort @diff;
#print "\nIntersect Array = @isect\n";
foreach my $el(@sorted){
    print "$el\n";
}

OUTPUT:

1019 T IFVATVPVI 0.691 PKC
1019 T IFVATVPVI 0.699 PKC
1036 T YFLQTSQQL 0.785 PKC
1036 T YFLQTSQQL 0.789 PKC
1037 S FLQTSQQLK 0.585 DNAPK
1037 S FLQTSQQLK 0.589 DNAPK
1045 S KQLESEGRS 0.669 PKC

This works good, I just want to print from which line (KB005 or the other) a given line comes from..

Anybody's willing to help? Thx

+1  A: 

As you only want the ones with unique lines you could change the start to the following:

my %hash = ();
my $line = 0;

foreach my $item (@a1, @a2)
{ 
   $line++
   $hash{$item}{count}++;
   $hash{$item}{line} = $line;
}
close FH;

foreach $item (keys %hash) {
    if ($hash{$item} > 1) {
        next;
    } else {
        push @diff, $item;
    }
}

my @sorted = sort @diff;
my $lineNo = 0;

foreach my $el(@sorted){
    $lineNo = $hash{$el}{line};
    print "$el, $lineNo\n";
}

Or something very like that. Create a more detailed hash structure.

This code is not tested, but the theory should be ok.

I dont understand the first part about reading into 2 arrays if the contents are from a single file. You can overcome this by building the hash as you are reading the file:

 open FH, "file.txt";
 while (<FH>)
 { 
    my @items = split (/ /, $_);
    my $item = $items[0];
    $line++
    $hash{$item}{count}++;
    $hash{$item}{line} = $line;
 }
 close FH;

But I could be misunderstanding this part.

Hope this helps

Xetius
A: 

Now i got confused.. MY task is to extract lines from this file which appear only once (either KB0005 or KB204320) or lines which differs in value in column5. SO in the output i want to have (for example) KB0005 has different value at position 1019 for PKC compared to KB204320 [0.691-0.699] KB0005 has different value at position 1037 for DNAPK compared to KB204320 [0.585-0.589] ... or KB has additional record at position 1045 for PKC

A: 

Thx FM it is working great :) can you tell me how can i combine this script with other file- i mean i have a list of pairs just like KB0005 and KB204320, and i want to perform this action for every single of them. Can u help me with this one too? :) Id be grateful

A: 

You could use String::Diff after you've filtered your items.

Geo