ansaurus

Question

Answer 1

+1 A:

awk 'BEGIN{P=1}{if(P==1||P==2){gsub(/^[@]/,">");print}; if(P==4)P=0; P++}' data

>SRR018006.2016 GA2:6:1:20:650 length=36
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN
>SRR018006.19405469 GA2:6:100:1793:611 length=36
ACCCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

below

awk '{gsub(/^[@]/,">"); print}' data

where data is your data file. I've received:

>SRR018006.2016 GA2:6:1:20:650 length=36
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN
+SRR018006.2016 GA2:6:1:20:650 length=36
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!+!
>SRR018006.19405469 GA2:6:100:1793:611 length=36
ACCCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+SRR018006.19405469 GA2:6:100:1793:611 length=36
7);;).;);;/;*.2>/@@7;@77<..;)58)5/>/

bua 2009-10-09 07:38:33

this is correct !

bua 2009-10-09 08:25:00

Answer 2

+1 A:

Here's the solution to the "skip every other line" part of the problem that I just learned from SO:

while read line
do
    # print two lines
    echo "$line"
    read line_to_print
    echo "$line_to_print"

    # and skip two lines
    read line_to_skip
    read line_to_skip
done

If all that needs to be done is change one @ to >, then I reckon

while read line
do
    echo "$line" | sed 's/@/>/'
    read line
    echo "$line"

    read line_to_skip
    read line_to_skip
done

will do the job.

mobrule 2009-10-09 07:39:06

there should be an input redirection for the input file. to replace characters in bash, ${line/@/>} should suffice. no need sed.

ghostdog74 2009-10-09 12:00:48

Answer 3

+1 A:

Something like:

awk 'BEGIN{a=0}{if(a==1){print;a=0}}/^@/{print;a=1}' myFastqFile | sed 's/^@/>/'

should work.

mouviciel 2009-10-09 07:39:40

since you are using awk already, there is no need to waste an extra process calling sed. do the substitution inside awk.

ghostdog74 2009-10-09 11:58:10

You are right. I have upvoted your answer.

mouviciel 2009-10-09 12:44:28

Answer 4

+1 A:

I think, with gnu grep this could be done with this:

grep -A 1 "^@" t.txt | grep -v "^--" | sed -e "s/^@/\>/"

dz 2009-10-09 07:40:23

if the file happens to be very big, there's no point chaining greps and sed together like that.

ghostdog74 2009-10-09 12:02:20

Answer 5

+2 A:

See fastq2fasta.pl in http://www.ringtail.tsl.ac.uk/david-studholme/scripts/

Pierre 2009-10-09 07:45:23

Answer 6

+3 A:

just awk , no need other tools

# awk '/^@SR/{gsub(/^@/,">",$1);print;getline;print}' file
>SRR018006.2016 GA2:6:1:20:650 length=36
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN
>SRR018006.19405469 GA2:6:100:1793:611 length=36
ACCCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

ghostdog74 2009-10-09 11:57:09

Answer 7

+3 A:

sed ain't dead. If we're golfing:

sed '/^@/!d;s//>/;N'

Or, emulating http://www.ringtail.tsl.ac.uk/david-studholme/scripts/fastq2fasta.pl posted by Pierre, which only prints the first word (the id) from the first line and does (some) error handling:

#!/usr/bin/sed -f
# Read a total of four lines
$b error
N;$b error
N;$b error
N
# Parse the lines
/^@\(\([^ ]*\).*\)\(\n[ACGTN]*\)\n+\1\n.*$/{
  # Output id and sequence for FASTA format.
  s//>\2\3/
  b
}
:error
i\
Error parsing input:
q

There seem to be plenty of existing tools for converting these formats; you should probably use these instead of anything posted here (including the above).

Mark Edgar 2009-10-09 14:45:31

Answer 8

+1 A:

As detailed in Cock, et al (2009) NAR, many of these solutions are incorrect since "the ‘@’ marker character (ASCII 64) may occur anywhere in the quality string. This means that any parser must not treat a line starting with ‘@’ as indicating the start of the next record, without additionally checking the length of the quality string thus far matches the length of the sequence."

See http://ukpmc.ac.uk/articlerender.cgi?accid=PMC2847217 for details.

Casey B. 2010-05-09 20:06:48

ansaurus

tags:

views:

answers:

Converting FASTQ to FASTA with SED/AWK

related questions