tags:

views:

462

answers:

2

Hello, I am asking for your help with sed. I need to remove duplicate underscores and underscores from beginning and end of string.

For example:

echo '[Lorem] ~ ipsum *dolor* sit metus !!!' | sed 's/[^ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789._()-]/_/g'

Produces: _Lorem____ipsum__dolor__sit_metus____

But I need to further format this string to: Lorem_ipsum_dolor_sit_metus

In other words, remove any underscores from beginning and end of string, and reduce multiple consecutive underscore symbols into just one, preferably using another pipes.

Do you have any idea how to do that?

Thank you.

+1  A: 

Just add ;s/__*/_/g;s/^_//;s/_$// just after g in your sed command.

mouviciel
A: 

All you need to do is add a "+" after your bracket expression to eliminate runs of multiple underscores. Then you can delete the beginning and ending ones. Also, as ladenedge suggested, you can use a character class to shorten your list.

sed 's/[^[:alnum:].()-]\+/_/g;s/^_\(.*\)_$/\1/'
Dennis Williamson