ansaurus

Question

How can I change 0123 into ACTG with tr or Perl?

Answer 1

+11 A:

You've practically worked out the answer yourself. Simply:

tr 0123 ACGT <input_file >output_file

or:

echo 2033010 | tr 0123 ACGT

David Gelhar 2010-04-21 02:24:08

Good call, +1, but I'm not sure the OP wants to be doing advanced DNA analysis in bash :-)

paxdiablo 2010-04-21 02:26:36

@paxdiablo the question title calls for a solution using "unix tr" among others.

hobbs 2010-04-21 06:58:31

@hobbs, I don't have a problem with the answer (I upvoted it) - I just don't usually think of UNIX text manipulation tools when genetic analysis comes to mind. I wonder how long the genome sequencing task would have taken if we'd done it with cmd.exe :-)

paxdiablo 2010-04-21 08:57:50

There's no analysis here. It's just changing the encoding from digits to letters.

brian d foy 2010-04-21 12:48:29

Answer 2

+7 A:

Here:

perl -p -e 'tr/0123/ACGT/'

Verification:

$ perl -p -e 'tr/0123/ACGT/' <~/input
CAA
CAC
CAG
CAT

Kinopiko 2010-04-21 02:24:32

Answer 3

+5 A:

$ echo 3210 | tr 0123 ACGT
TGCA

When not using any options, tr takes two sets of characters, and makes a 1:1 mapping from the first set to the second set. So, as written above, 0 maps to A, 1 maps to C, 2 maps to G, and 3 maps to T.

Mark Rushakoff 2010-04-21 02:24:40

Answer 4

+7 A:

Just for completeness:

sed 'y/0123/ACGT/' file

Dennis Williamson 2010-04-21 02:42:01

Answer 5

+2 A:

   $ awk -vFS="" 'BEGIN{_["1"]="C";_["2"]="G";_["3"]="T";_["0"]="A"}{for(i=1;i<=NF;i++){printf _[$i]}print ""}' file
    CAA
    CAC
    CAG
    CAT

ghostdog74 2010-04-21 02:45:54

Using `split()` makes the initialization neater `awk -F '' 'BEGIN {split("ACGT",a,"")} {for(i=1;i<=NF;i++){printf a[$i+1]}print ""}'`

Dennis Williamson 2010-04-21 03:57:13

ansaurus

tags:

views:

answers:

How can I change 0123 into ACTG with tr or Perl?

related questions