tags:

views:

254

answers:

2

I have program output that looks like this (tab delim):

    $ ./mycode somefile 
    0000000000000000000000000000000000      238671
    0000000000000000000000000000000001      0
    0000000000000000000000000000000002      0
    0000000000000000000000000000000003      0
    0000000000000000000000000000000010      0
    0000000000000000000000000000000011      1548.81
    0000000000000000000000000000000012      0
    0000000000000000000000000000000013      937.306

What I want to do is on FIRST column only: replace 0 with A, 1 with C, 2 with G, and 3 with T. Is there a way I can transliterate that output piped directly from "mycode". Yielding this:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA        238671
...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACT        937.306
+6  A: 

It should be possible to do it with sed, put this in a file (you can do it command-line to, with -e, just don't forget those semicolons, or use separate -e for each line). (EDIT: Keep in mind, since your data is tab delimited, it should in fact be a tab character, not a space, in the first s//, make sure your editor doesn't turn it into spaces)

#!/usr/bin/sed -f

h
s/ .*$//
y/0123/ACGT/
G
s/\n[0-3]*//

and use

./mycode somefile | sed -f sedfile

or chmod 755 sedfile and do

./mycode somefile | sedfile

The steps performed are:

  1. copy buffer to hold space (replacing held content from previous line, if any)
  2. remove trailing stuff (from first space to end of line)
  3. transliterate
  4. append contents from hold space
  5. remove the newline (from the append step) and all digits following it (up to the space)

Worked for me on your data at least.

EDIT:
Ah, you wanted a one-liner...

GNU sed

sed -e "h;s/ .*$//;y/0123/ACGT/;G;s/\n[0-3]*//"

or old-school sed (no semicolons)

sed -e h -e "s/ .*$//" -e "y/0123/ACGT/" -e G -e "s/\n[0-3]*//"
roe
+6  A: 

Using Perl:

C:\> ./mycode file | perl -lpe "($x,$y)=split; $x=~tr/0123/ACGT/; $_=qq{$x\t$y}"
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA      238671
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC      0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG      0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAT      0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACA      0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACC      1548.81
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACG      0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACT      937.306

You can use single quotes in Bash:

  
$ ./mycode file | perl -lpe '($x,$y)=split; $x=~tr/0123/ACGT/; $_="$x\t$y"' 

As @ysth notes in the comments, perl actually provides the command line options -a and -F:

 -a                autosplit mode with -n or -p (splits $_ into @F)
 ...
 -F/pattern/       split() pattern for -a switch (//'s are optional)

Using those:

perl -lawnF'\t' -e '$,="\t"; $F[0] =~ y/0123/ACGT/; print @F'
Sinan Ünür
or with -F: perl -lawnF'/\t/' -e'$,="\t"; $F[0]=~y/0123/ACGT/; print @F'
ysth
@ysth I always forget about `-F`.
Sinan Ünür