views:

226

answers:

3

I have a text file ("INPUT.txt") of the format:

A<LF>
B<LF>
C<LF>
D<LF>
X<LF>
Y<LF>
Z<LF>
<EOF>

which I need to reformat to:

A:B:C:D:X:Y:Z<LF>
<EOF>

I know you can do this with 'sed'. There's a billion google hits for doing this with 'sed'. But I'm trying to emphasis readability, simplicity, and using the correct tool for the correct job. 'sed' is a line editor that consumes and hides newlines. Probably not the right tool for this job!

I think the correct tool for this job would be 'tr'. I can replace all the newlines with colons with the command:

cat INPUT.txt | tr '\n' ':'

There's 99% of my work done. I have a problem, now, though. By replacing all the newlines with colons, I not only get an extraneous colon at the end of the sequence, but I also lose the carriage return at the end of the input. It looks like this:

A:B:C:D:X:Y:Z:<EOF>

Now, I need to remove the colon from the end of the input. However, if I attempt to pass this processed input through 'sed' to remove the final colon (which would now, I think, be a proper use of 'sed'), I find myself with a second problem. The input is no longer terminated by a newline at all! 'sed' fails outright, for all commands, because it never finds the end of the first line of input!

It seems like appending a newline to the end of some input is a very, very common task, and considering I myself was just sorely tempted to write a program to do it in C (which would take about eight lines of code), I can't imagine there's not already a very simple way to do this with the tools already available to you in the Linux kernel.

+1  A: 

Sometimes, all you have to do is start to ask the question, and the answer just comes to you. I figured it out while writing my question, but decided to post it anyway. The solution is 'echo'. 'echo' will, also, append a newline. So the solution is:

echo `cat INPUT.txt | tr '\n' ':'` | sed 's/:$//'
Maarx
A command in the pipeline might still be considered more elegant than putting the entire thing in backticks. I welcome other suggestions.
Maarx
+4  A: 

This should do the job (cat and echo are unnecessary):

tr '\n' ':' < INPUT.TXT | sed 's/:$/\n/'

Using only sed:

sed -n ':a; $ ! {N;ba}; s/\n/:/g;p' INPUT.TXT

Bash without any externals:

string=($(<INPUT.TXT))
string=${string[@]/%/:}
string=${string//: /:}
string=${string%*:}

Using a loop in sh:

colon=''
while read -r line
do
    string=$string$colon$line
    colon=':'
done < INPUT.TXT

Using AWK:

awk '{a=a colon $0; colon=":"} END {print a}' INPUT.TXT

Or:

awk '{printf colon $0; colon=":"} END {printf "\n" }' INPUT.TXT

Edit:

Here's another way in pure Bash:

string=($(<INPUT.TXT))
saveIFS=$IFS
IFS=':'
newstring="${string[*]}"
IFS=$saveIFS
Dennis Williamson
I was at first baffled as to why you'd post as a solution the exact thing I said didn't work, so I tried it on another machine.I realized then that the Sun server that I needed the solution on was not using GNU 'sed'. The version of 'sed' on the server fails when the input has no terminating newline, hence, as outlined, why I used 'echo'.((The server is a mission critical device at work that has never failed, and thus, has never been restarted, let alone updated, in years. Welcome to my life.))The shell loop solution is awesome, though.
Maarx
/bin/sed on Sun...ick. How about /usr/xpg4/bin/sed?
William Pursell
+1  A: 

Here's yet another solution: (assumes a character set where ':' is octal 72, eg ascii)

perl -l72 -pe '$\="\n" if eof' INPUT.TXT
William Pursell