views:

185

answers:

3

Hi,

We need to write a email validation program in C. We are planning to use GNU Cregex.h) regular expression.

The regular expression we prepared is

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

But the below code is failing while compiling the regex.

#include <stdio.h>
#include <regex.h>

int main(const char *argv, int argc)
{

    const char *reg_exp = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?";

int status = 1;

    char email[71];

    regex_t preg;

    int rc;

    printf("The regex = %s\n", reg_exp);

    rc = regcomp(&preg, reg_exp, REG_EXTENDED|REG_NOSUB);
    if (rc != 0)
    {
            if (rc == REG_BADPAT || rc == REG_ECOLLATE)
                    fprintf(stderr, "Bad Regex/Collate\n");
            if (rc == REG_ECTYPE)
                    fprintf(stderr, "Invalid Char\n");
            if (rc == REG_EESCAPE)
                    fprintf(stderr, "Trailing \\\n");
            if (rc == REG_ESUBREG || rc == REG_EBRACK)
                    fprintf(stderr, "Invalid number/[] error\n");
            if (rc == REG_EPAREN || rc == REG_EBRACE)
                    fprintf(stderr, "Paren/Bracket error\n");
            if (rc == REG_BADBR || rc == REG_ERANGE)
                    fprintf(stderr, "{} content invalid/Invalid endpoint\n");
            if (rc == REG_ESPACE)
                    fprintf(stderr, "Memory error\n");
            if (rc == REG_BADRPT)
                    fprintf(stderr, "Invalid regex\n");

            fprintf(stderr, "%s: Failed to compile the regular expression:%d\n", __func__, rc);
            return 1;
    }
    while (status)
    {
            fgets(email, sizeof(email), stdin);
            status = email[0]-48;

            rc = regexec(&preg, email, (size_t)0, NULL, 0);
            if (rc == 0)
            {
                    fprintf(stderr, "%s: The regular expression is a match\n", __func__);
            }
            else
            {
                    fprintf(stderr, "%s: The regular expression is not a match: %d\n", __func__, rc);
            }
    }

    regfree(&preg);

    return 0;
}

The regex compilation is failing with the below error.

The regex = [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
Invalid regex
main: Failed to compile the regular expression:13

What is the cause of this error? Whether the regex need to be modified?

Thanks, Mathew Liju

+1  A: 

Your problem is the four instances of the sequence (?. That's meaningless - the ( starts a new sub-regex and you can't have ? at the start of a regex.

caf
(?: usually refers to a non-capturing subgroup, at least in the regex languages I've used.
Turnor
@liju: Just drop the `?:`, you'll get the same effect (due to using the `REG_NOSUB` flag)
Hasturkun
Not in POSIX regexes. @Liju Mathew: If `?:` is supposed to represent a non-addressable sub-regex, then just leave them out - you don't have any back-references so it won't make any difference.
caf
When I removed ?: the program is failing for email address "[email protected]" This is supposed to be rejected but accepted.
Liju Mathew
Well, that would be because your regular expression doesn't do what you think it does. Additionally you don't seem to have transcribed it correctly into the C code; where you have a `\.` in your original regexp, you have just `.` in the C string (where I would expect to see `\\.`).
caf
Liju Mathew
+1  A: 

In case you're interested,

I saw recently the Perfect email regex finally found post on Hacker News and
it's about the Comparing E-mail Address Validating Regular Expressions.

The regexs,

// James Watts and Francisco Jose Martin Moreno are the first to develop one which  
// passes all of the tests.
/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i

// Arluison Guillaume has also improved Warren Gaebel's regex.
// This one will work in JavaScript:
/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+)*\.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i 
Nick D