views:

470

answers:

5

Hello All,

I'm trying to do something common enough: Parse user input in a shell script. If the user provided a valid integer, the script does one thing, and if not valid, it does something else. Trouble is, I haven't found an easy (and reasonably elegant) way of doing this - I don't want to have to pick it apart char by char.

I know this must be easy but I don't know how. I could do it in a dozen languages, but not BASH!

In my research I found this:

http://stackoverflow.com/questions/136146/regular-expression-to-test-whether-a-string-consists-of-a-valid-real-number-in-ba

And there's an answer therein that talks about regex, but so far as I know, that's a function available in C (among others). Still, it had what looked like a great answer so I tried it with grep, but grep didn't know what to do with it. I tried -P which on my box means to treat it as a PERL regexp - nada. Dash E (-E) didn't work either. And neither did -F.

Just to be clear, I'm trying something like this, looking for any output - from there, I'll hack up the script to take advantage of whatever I get. (IOW, I was expecting that a non-conforming input returns nothing while a valid line gets repeated.)

snafu=$(echo "$2" | grep -E "/^[-+]?(?:\.[0-9]+|(?:0|[1-9][0-9]*)(?:\.[0-9]*)?)$/")
if [ -z "$snafu" ] ;
then
   echo "Not an integer - nothing back from the grep"
else
   echo "Integer."
fi

Would someone please illustrate how this is most easily done?

Frankly, this is a short-coming of TEST, in my opinion. It should have a flag like this

if [ -I "string" ] ;
then
   echo "String is a valid integer."
else
   echo "String is not a valid integer."
fi

Thanks.

+5  A: 
[[ $var =~ ^-?[0-9]+$ ]]
Ignacio Vazquez-Abrams
Thanks Ignacio, I'll try it in a second. Would you mind explaining it so I can learn a little? I gather it reads, "At the start of the string (^), a minus sign (-) is optional (?), followed by any number of characters between zero and 9, inclusive" ... and what then might the +$ mean? Thanks.
Richard T
The `+` means "1 or more of the preceding", and the `$` indicates the end of the input pattern. So the regex matches an optional `-` followed by one or more decimal digits.
Ignacio Vazquez-Abrams
+2  A: 

For portability to pre-Bash 3.1 (when the =~ test was introduced), use expr.

if expr "$string" : '-\?[0-9]\+$' >/dev/null
then
  echo "String is a valid integer."
else
  echo "String is not a valid integer."
fi

expr STRING : REGEX searches for REGEX anchored at the start of STRING, echoing the first group (or length of match, if none) and returning success/failure. This is old regex syntax, hence the excess \. -\? means "maybe -", [0-9]\+ means "one or more digits", and $ means "end of string".

Bash also supports extended globs, though I don't recall from which version onwards.

shopt -s extglob
case "$string" of
    @(-|)[0-9]*([0-9]))
        echo "String is a valid integer." ;;
    *)
        echo "String is not a valid integer." ;;
esac

# equivalently, [[ $string = @(-|)[0-9]*([0-9])) ]]

@(-|) means "- or nothing", [0-9] means "digit", and *([0-9]) means "zero or more digits".

ephemient
Thank you ephemient, much obliged. I had never seen the =~ syntax before - and still have no idea what it's supposed to mean - approximately equal?! ...I've never been excited to program in BASH but it _is_ necessary some times!
Richard T
In `awk`, `~` was the "regex match" operator. In Perl (as copied from C), `~` was already used for "bit complement", so they used `=~`. This later notation got copied to several other languages. (Perl 5.10 and Perl 6 like `~~` more, but that has no impact here.) I suppose you could look at it as some sort of approximate equality...
ephemient
Excellent post AND edit! I really appreciate explaining what it means. I wish I could mark both yours and Ignacio's posts as THE correct answer. -frown- You guys are both great. But as you have double the reputation he does, I'm giving it to Ignacio - hope you understand! -smile-
Richard T
+1  A: 

You can strip non-digits and do a comparison. Here's a demo script:

for num in "44" "-44" "44-" "4-4" "a4" "4a" ".4" "4.4" "-4.4" "09"
do
    match=${num//[^[:digit:]]}    # strip non-digits
    match=${match#0*}             # strip leading zeros
    echo -en "$num\t$match\t"
    case $num in
        $match|-$match)    echo "Integer";;
                     *)    echo "Not integer";;
    esac
done

This is what the test output looks like:

44      44      Integer
-44     44      Integer
44-     44      Not integer
4-4     44      Not integer
a4      4       Not integer
4a      4       Not integer
.4      4       Not integer
4.4     44      Not integer
-4.4    44      Not integer
09      9       Not integer
Dennis Williamson
Hi Dennis, Thank you for introducing me to the syntax to the right of match= above. I haven't ever noticed that type syntax before. I recognize some of the syntax from tr (a utility I haven't quite mastered, but fumble my way through sometimes); where can I read up on such syntax? (ie, what's this type of thing called?) Thanks.
Richard T
You can look in the Bash man page in the section called "Parameter Expansion" for information about `${var//string}` and `${var#string}` and in the section called "Pattern Matching" for [^[:digit:]]` (which is also covered in `man 7 regex`).
Dennis Williamson
+1  A: 

Here's yet another take on it (only using the test builtin command and its return code):

function is_int() { return $(test "$@" -eq "$@" > /dev/null 2>&1); } 

input="-123"

if $(is_int "${input}");
then
   echo "Input: ${input}"
   echo "Integer: $[${input}]"
else
   echo "Not an integer: ${input}"
fi
hans
Yes, this is also a valid approach.
Richard T
A: 

I like the solution using the -eq test, because it's basically a one-liner. My own solution was to use brace expansion to throw away all the numerals and see if there was anything left. (I'm still using 3.0, haven't used [[ or expr before, but glad to meet them.)

if [ "${INPUT_STRING//[0-9]*}" = "" ]; then
  # yes, natural number
else
  # no, has non-numeral chars
fi

nortally