ansaurus

Question

Sed: Replace all whitespace with a line break/paragraph mark to make a word list.

Answer 1

A:

You're not alone, sed is greek to most people.

Dave 2009-12-05 18:37:53

Answer 2

A:

You can use

sed -e 's/\s\+/\n/g' old > new

The escape sequence \s matches any whitespace character (space, tab, newline, and so on), so \s\+ means a run of one or more whitespace characters to be replaced by a single newline. The /g on the end means perform the substitution as many times as possible rather than just once.

The command above uses the file old as input and writes the modified output to new.

Greg Bacon 2009-12-05 18:40:59

in some unix's \s is not supported.there we can use s/ / instead of s/\s/.

Vijay Sarathi 2009-12-06 06:03:45

Answer 3

+1 A:

This should do the work:

sed -e 's/[ \t]+/\n/g'

[ \t] means a space OR an tab. If you want any kind of space, you could also use \s

[ \t]+ means as many spaces OR tabs as you want (but at least one)

s/x/y/ means replace the pattern x by y (here \n is a new line)

the g at the end means that you have to repeat as many times it occures in every line.

Tristram Gräbener 2009-12-05 18:42:19

At least under Linux that needs to be "sed -r -e ..."

Richard Pennington 2009-12-05 18:44:20

... but the explanation is correct,

Richard Pennington 2009-12-05 18:45:09

Answer 4

+1 A:

The portable way to do this is:

sed -e 's/[ \t][ \t]*/\
/g'

That's an actual newline between the backslash and the slash-g. Many sed implementations don't know about \n, so you need a literal newline. The backslash before the newline prevents sed from getting upset about the newline. (in sed scripts the commands are normally terminated by newlines)

With GNU sed you can use \n in the substitution, and \s in the regex:

sed -e 's/\s\s*/\n/g'

GNU sed also supports "extended" regular expressions (that's egrep style, not perl-style) if you give it the -r flag, so then you can use +:

sed -r -e 's/\s+/\n/g'

If this is for Linux only, you can probably go with the GNU command, but if you want this to work on systems with a non-GNU sed (eg: BSD, Mac OS-X), you might want to go with the more portable option.

Laurence Gonsalves 2009-12-05 19:13:08

Answer 5

A:

gawk '{$1=$1}1' OFS="\n" file

2009-12-06 05:03:57

ansaurus

tags:

views:

answers:

Sed: Replace all whitespace with a line break/paragraph mark to make a word list.

related questions