tags:

views:

113

answers:

5

I have a file with such list:

100
101
102
103

What I want to do is to replace every 0 into A, 1 into C, 2 into G, 3 into T. Hence we hope to get

CAA
CAC
CAG
CAT
+11  A: 

You've practically worked out the answer yourself. Simply:

tr 0123 ACGT <input_file >output_file

or:

echo 2033010 | tr 0123 ACGT
David Gelhar
Good call, +1, but I'm not sure the OP wants to be doing advanced DNA analysis in bash :-)
paxdiablo
@paxdiablo the question title calls for a solution using "unix tr" among others.
hobbs
@hobbs, I don't have a problem with the answer (I upvoted it) - I just don't usually think of UNIX text manipulation tools when genetic analysis comes to mind. I wonder how long the genome sequencing task would have taken if we'd done it with cmd.exe :-)
paxdiablo
There's no analysis here. It's just changing the encoding from digits to letters.
brian d foy
+7  A: 

Here:

perl -p -e 'tr/0123/ACGT/'

Verification:

$ perl -p -e 'tr/0123/ACGT/' <~/input
CAA
CAC
CAG
CAT
Kinopiko
+5  A: 
$ echo 3210 | tr 0123 ACGT
TGCA

When not using any options, tr takes two sets of characters, and makes a 1:1 mapping from the first set to the second set. So, as written above, 0 maps to A, 1 maps to C, 2 maps to G, and 3 maps to T.

Mark Rushakoff
+7  A: 

Just for completeness:

sed 'y/0123/ACGT/' file
Dennis Williamson
+2  A: 
   $ awk -vFS="" 'BEGIN{_["1"]="C";_["2"]="G";_["3"]="T";_["0"]="A"}{for(i=1;i<=NF;i++){printf _[$i]}print ""}' file
    CAA
    CAC
    CAG
    CAT
ghostdog74
Using `split()` makes the initialization neater `awk -F '' 'BEGIN {split("ACGT",a,"")} {for(i=1;i<=NF;i++){printf a[$i+1]}print ""}'`
Dennis Williamson