tags:

views:

68

answers:

4

So let's see how can we do this: trim the text width within a certain value, say, 10. For lines longer than 10, break it into multiple lines.

Example: A text file:

01234567
01234567890123456789abcd
0123

should be changed to:

01234567
0123456789
0123456789
abcd
0123

So how can we do it using sed or awk as short as possible?

+2  A: 
$ sed -e 's/\(..........\)/\1\\n/g' foo.txt

or, if that doesn't work (eg, don't have a sufficiently new gnu sed), just insert a newline and make sure it's quoted:

$ sed -e 's/\(..........\)/\1\\
/g' foo.txt

You can pretty much transliterate that into awk, too:

$ awk '{ gsub(/........../, "&\n" ) ; print}' foo.txt
Jonathan Dursi
+2  A: 
Jonathan Leffler
Nice -- learn something new every day! Didn't know about the \{m,n\} business.
Jonathan Dursi
Jonathan Dursi
@Jonathan - yup; the 10,10 was a later change from 1,10 which didn't work as well, and I forgot to eliminate the repeat.
Jonathan Leffler
It rocks, thank you.
lukmac
A: 

In awk with a variable width:

awk -v WIDTH=5 '{ gsub(".{"WIDTH"}", "&\n"); printf $0 }; !/\n$/ { print "" }'

The final statement prevents the printing of extra newlines when the line is an exact multiple of the maximum line width.

schot
+2  A: 

Use the proper tool for the job...

fold -w 10
Dennis Williamson
But where's the *challenge* in that... `fold` is part of POSIX and should be pretty wide available.
schot
still great to know this tool
lukmac
I've tried this with a text with Spanish characters (e.g. á¿¡) and the characters where not output well. Using the sed command did. Give it yourselves a go. I just tried with a file containing the line "á¿¡012345" without quotes, and fold -w 5 > blah. Maybe a fold issue?
rturrado
@rturrado: Unfortunately, `fold` is not Unicode-aware. Bugs have been filed and patches have been created, but it's still broken.
Dennis Williamson