views:

1269

answers:

4

Hello,

im trying to make a bash script to check if a email address is correct.

I have this regular expression:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

Source: http://www.regular-expressions.info/email.html

And this is my bash script:

regex=[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

i="[email protected]"
if [[ $i=~$regex ]] ; then
    echo "OK"
else
    echo "not OK"
fi

The script fails and give me this output: 10: Syntax error: EOF in backquote substitution

any clue??

+4  A: 

Quotes, backticks and others are special characters in shell scripts and need to be escaped if they are used like in the assignment of regex. You can escape special characters with backslashes, or use single quotes around the regex if you leave out the single quote used in it.

I would recommend to use a simpler regular expression like .*@.* because all the complexity is futile. [email protected] looks perfectly fine and will be accepted by any regular expression, but still it doesn't exist.

sth
+1  A: 

You have several problems here:

  • The regular expression needs to be quoted and special characters escaped.
  • The regular expression ought to be anchored (^ and $).
  • ?: is not supported and needs to be removed.
  • You need spaces around the =~ operator.

Final product:

regex="^[a-z0-9!#\$%&'*+/=?^_\`{|}~-]+(\.[a-z0-9!#$%&'*+/=?^_\`{|}~-]+)*@([a-z0-9]([a-z0-9-]*[a-z0-9])?\.)+[a-z0-9]([a-z0-9-]*[a-z0-9])?\$"

i="[email protected]"
if [[ $i =~ $regex ]] ; then
    echo "OK"
else
    echo "not OK"
fi
Peter Eisentraut
I would like to add, however, that doing this in bash is quite, um, suboptimal. But I wanted to highlight how to fix the approach that you had chosen.
Peter Eisentraut
+3  A: 

you don't have to create such a complicated regex to check valid email. you can split on "@", then check whether there are 2 items, one that is in front of the @, and the other at the back.

i="test@terraes"
IFS="@"
set -- $i
if [ "${#@}" -ne 2 ];then
    echo "invalid email"
fi
domain="$2"
dig $domain | grep "ANSWER: 1" 1>/dev/null && echo "domain ok"

if you want to do further checking on the valid domain, you can use tools like dig to query the domain. It is better than regex because @new.jersey gets matched by regex but its actually not a proper domain.

ghostdog74
I actually think this is a much saner approach. think of all the website that disallow : [email protected] even though this is a perfectly valid email. it should get rid of most fakes and still be ok. you could make it a bit stronger by checking the presence of a '.' in the second element and making sure it separates the second element in 2 subelements. Think international domains for exemple
Jean
A: 

The immediate problem with your script is you need to fix the quoting:

regex='[a-z0-9!#$%&'"'"'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'"'"'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?'

However, this regular expression does not accept all syntactically valid email addresses. Even if it did, not all syntactically valid email addresses are deliverable.

If deliverable addresses are what you care about, then don't bother with a regular expression or other means of checking syntax: send a challenge to the address that the user supplies. Be careful not to use untrusted input as part of a command invocation! With sendmail, run sendmail -oi -t and write a message to the standard input of the sendmail process, e.g.,

To: [email protected]
From: [email protected]
Subject: email address confirmation

To confirm your address, please visit the following link:

http://www.your.organization.invalid/verify/1a456fadef213443
Greg Bacon