tags:

views:

293

answers:

4

Here's the situation, I have a text file that is pipe-delimited and one of fields contains pipe characters. I already have a sed script that will change it to be tab-delimited, but the problem is it's terribly slow. It will replace the first occurrence of a pipe 8 times, then replace the last occurrence of a pipe 4 times. I'm hoping there's a quicker way to do what I need.

Any thoughts would be appreciated. Here's my current sed script:

sed 's/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/\(.*\)|/\t/;s/\(.*\)|/\t/;s/\(.*\)|/\t/;s/\(.*\)|/\t/' $1 > $1.tab

Thanks,

-Dan

A: 
 sed 's/\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|/\1\t\2\t\3\t\4\t\5\t\6\t\7\t\8\t/;s/|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)$/\t\1\t\2\t\3\t\4/'

HTH

Zsolt Botykai
Perfect! Thanks!
Dan Watling
what a mess....
ghostdog74
+1  A: 

This is somewhat scalable, but it's still an eye-glazer. You can change the "8" and the "4" to select which ranges of pipes you want to replace or change the pipes or tabs to some other characters.

As a one-liner:

sed 's/|/\n/8; h; s/.*\n//; x; s/\n.*/\t/; s/|/\t/g; G; s/\n//; s/\(\(|[^|]*\)\{4\}\)$/\n\1/; h; s/.*\n//; s/|/\t/g; x; s/\n.*//; G; s/\n//'

Here it is broken out. I've over-commented it so it's easy to follow.

sed '
s/|/\n/8     # split
h            # dup
s/.*\n//
# this is now the field which will retain the pipes 
# plus the fields at the end of the record
x            # swap
s/\n.*/\t/   # replace
s/|/\t/g
# this is now all the tab-delimited fields at the beginning of the record
G            # append
s/\n//
# this is now the full record with the first part completed
# the rest of the steps are similar to the steps above
s/\(\(|[^|]*\)\{4\}\)$/\n\1/    # split
h            # dup
s/.*\n//
s/|/\t/g     #replace
# this is now the last four fields that have been tab delimited
x            # swap
s/\n.*//
# this is the first eight fields plus the field with the retained pipes
G            # append
s/\n//
# now print the full record with everything done
'
Dennis Williamson
A: 

Hi Dan,

Dennis is right you should use the quantifier to specify how many occurences of the pattern you want the action to be performed on.

Have a look at the below link under the "Basic substitutions" as it's more readable on the website than it's here: http://www.readylines.com/sed-one-liners-examples

Hope that helps.

some dude
+1  A: 

I worked with Dan when he needed this, but realized (like ghostdog74) that AWK was a better tool, but here's my possibly inefficient answer.

awk -F"|" 'BEGIN{OFS="\t"}{for (i=10; i < NF-3; i++) $9=$9 "|" $i; print $1,$2,$3,$4,$5,$6,$7,$8,$9,$(NF-3),$(NF-2),$(NF-1),$(NF)}' $file > $file.tab

What do you folks think?

j_smitley