views:

125

answers:

5

I have a file that contain lines that looks like this:

>AF001546_1 [88 - 462] 1 MGQQ
>AF001543_1 [88 - 261] ACGT

Not that each line can contain 6 OR 5 fields. What I want to do is to capture Fields 1,2,3(num only), 5(num only) and last field (ACGT or MGOQ strings).

So the expected output is this:

>AF001546_1 88 462 MGQQ
>AF001543_1 88 261 ACGT

Now the perl one-liner I used is this, but failed:

perl -lne 'print "$1 $2 $3 $4" if /(\w+)_\d+\D+(\d+)\D+(\d+)\](\D+)/' 

What is the right way to do it?

+1  A: 
while(<>){
 chomp;
 s/\[|\]//g;
 if ($_ =~ /^>/){
    @s = split /\s+/;
    print "$s[0] $s[1] $s[3]\n";
 }    
}

$ perl -F"\s+" -lane '$F[3]=~s/\]//;$F[1]=~s/\[//;print "$F[0] $F[1] $F[3]";' file
>AF001546_1 88 462
>AF001543_1 88 261
ghostdog74
+1  A: 

try this perl -lne 'print "$1 $2 $3 $4" if /(\w+)_\d+\D+(\d+)\D+(\d+)](\D+)/m'

you need to use the modifier /m

coder
No. the /m modifier only changes ^ and $ which aren't even in your regex. Futhermore, the -n switch means it's processing a line at a time anyway.
p00ya
yep i agree. i want to insist on /m
coder
+1  A: 

Depending on how flexible the whitespace is, this is fairly readable:

print "$1 $2 $3 $4" if /([^_]+)_\d+ \[(\d+) - (\d+)\] (?:\d+ )?(.*)/
p00ya
+1  A: 
perl -lne 'print "$1 $2 $3 $4" if /(>\w+)\D+(\d+)\D+(\d+)\D+\d*\s+(\w+)/'
Amarghosh
+2  A: 

You use the following code also

use strict;
use warnings;

my $str=">AF001546_1 [88 - 462] 1 MGQQ";

if($str=~/(\w+)\s\D([0-9]{2}) - ([0-9]{3})\D\s\d\s(.*)/)
{
     print "$1 $2 $3 $4\n";
}
muruga