tags:

views:

377

answers:

4

I have data that looks like this:

> sq1
foofoofoobar
foofoofoo
> sq2
quxquxquxbar
quxquxquxbar
quxx
> sq3
foofoofoobar
foofoofoo
> sq4
foofoofoobar
foofoo

I want to join the lines on the basis of ">sqi" header as cut-off line, i.e. yielding:

foofoofoobarfoofoofoo
quxquxquxbarquxquxquxbarquxx
foofoofoobarfoofoofoo
foofoofoobarfoofoo

I tried using this sed but fail:

sed '/^S/d;N;s/\n/\t/'

What's the correct way to do it?

+1  A: 

You're testing for a capital "S" at the beginning of the line. You should be testing for the greater-than character:

sed '/^>/d;N;s/\n/\t/'

or

sed '/^> sq/d;N;s/\n/\t/'

Edit: I missed the fact that there are variable numbers of lines between the headers. This is what I have so far:

sed  -n '/^>/{x; p; d}; /^>/!H; x; s/\n/\t/; h; $p'

Unfortunately, this leaves in the header:

> sq1    foofoofoobar    foofoofoo
> sq2    quxquxquxbar    quxquxquxbar    quxx
> sq3    foofoofoobar    foofoofoo
> sq4    foofoofoobar    foofoo

If you do this from a Bash prompt, you may have to do set +H first so you don't get history expansion interference because of the exclamation point.

Edit2: My revised version that gets rid of the headers:

sed  -n '1{x;d};/^>/{x; p; d}; H; x; s/\n/\t/; s/^>.*\t//; h; $p'
Dennis Williamson
@DW: your snippet doesn't seem to work. I got "foofoofoobartfoofoofoo \nquxquxquxbartquxquxquxbar \nquxxt> sq3\nfoofoofoobartfoofoofoo \nfoofoofoobartfoofoo"
neversaint
+1  A: 

A bash solution for the original question (ie. without "headers"):

#!/bin/bash
text=[]
i=0

exec <$1

while read line
do
    text[$i]=$line
    let "i += 1"
done


j=0
len=0
while [ $j -lt ${#text[@]} ]
do
    string=${text[$j]}
    if [ $len -le ${#string} ] ; then
     printf $string
    else
     printf $string'\n'
    fi
    len=${#string}
    let "j += 1"
done
printf '\n'
Michael Foukarakis
+1  A: 

I can't find a simple way to do it in sed. Anyway, with gawk/mawk you just have to change the RS variable and cut newline characters:

awk -v RS='> sq[0-9]' 'NR>1{gsub(/\n/,"");print}' file
marco
+2  A: 
#!/bin/sed -f

# If this is a header line, empty it...
s/^>.*//
# ... and then jump to the 'end' label.
t end
# Otherwise, append this data line to the hold space.
H
# If this is not the last line, continue to the next line.
$!d
# Otherwise, this is the end of the file or the start of a header.
: end
# Call up the data lines we last saw (putting the empty line in the hold).
x
# If we haven't seen any data lines recently, continue to the next line.
/^$/d
# Otherwise, strip the newlines and print.
s/\n//g

# The one-line version:
# sed -e 's/^>.*//;te' -e 'H;$!d;:e' -e 'x;/^$/d;s/\n//g'
Mark Edgar