tags:

views:

57

answers:

3

I would like to take a string that may have multiple spaces in it and do the following:

1) Replace whitespace with an underscore

2) Remove any characters that are not A-Z or 0-9

3) Make the result all lowercase

Then be able to use the result as a variable. Any ideas?

+3  A: 

I think tr can do what you want.

variable=$(echo "${input}" | tr A-Z a-z | tr [:blank:] _ | tr -cd [:alnum:]_)

Explanation:

tr A-Z a-z - Translate upper cased letters to lower cased.

tr [:blank:] _ - Translates blank spaces to underscores.

tr -cd [:alnum:]_ - Delete all characters that are not alphanumeric, or underscores.

NOTE: If you want to remove underscores prior to converting spaces to underscores, tr -cd _ could be added near the beginning of the pipe chain.

Brandon Horsley
pra
+2  A: 

Here's another approach using sed:

oldvar="HELLO MY BABY @$#@$ HI"

newvar=`echo $oldvar | sed -e "s/[A-Z]/\l&/g" -e "s/[^a-z0-9]/_/g"`

results in:

hello_my_baby__0___hi
Amardeep
heh the 0 snuck in from $#
@duplicity: Yep. Good (accidental) test case to see what happens. :-)
Amardeep
I thought about suggesting sed but was not sure how to match "whitespace" instead of just space. Does sed support posix classes?
Brandon Horsley
@Brandon: I'm not sure if it supports the posix classes, but the final reg-exp above will make sure everything else gets converted into an underscore. In fact, now that I think about it the very first one matching space character simply isn't needed (removed).
Amardeep
Yes, `sed` supports POSIX classes. You could use `[[:blank:]]`, for example.
Dennis Williamson
Because $oldvar is not quoted, pathname generation will be done (try including " * " in the value). Additionally, the echo utility may do some interpretation ("-n" or strings containing backslashes are vulnerable); to avoid this use printf %s "$oldvar".
jilles
@jilles: Very nice refinement but it seems to lose the original space characters when I did that.
Amardeep
A: 
newvar=`echo "$oldvar" | awk '{gsub(/[ \t]+/,"_") ; gsub(/[^A-Z0-9]+/,"") ; print tolower($0)}' `
ghostdog74