ansaurus

Question

Code Golf: Email Address Validation without Regular Expressions

Answer 1

+2 A:

Whatever version of C++ MSVC2008 supports.

Here's my humble submission. Now I know why they told me never to do the things I did in here:

#define N return 0
#define I(x) &&*x!='.'&&*x!='_'
bool p(char*a) {
 if(!isalnum(a[0])I(a))N;
 char*p=a,*b=0,*c=0;
 for(int d=0,e=0;*p;p++){
  if(*p=='@'){d++;b=p;}
  else if(*p=='.'){if(d){e++;c=p;}}
  else if(!isalnum(*p)I(p))N;
  if (d>1||e>1)N;
 }
 if(b>c||b+1>=c||c+2>=p)N;
 return 1;
}

jeffamaphone 2009-09-07 18:03:57

Assumes a is properly NULL-terminated. <shrug/>

jeffamaphone 2009-09-07 18:07:36

It's nice to provide a character count in your answers, as well as the language used.

strager 2009-09-07 18:14:52

Answer 2

+19 A:

C89 (166 characters)

#define B(c)isalnum(c)|c==46|c==95
#define C(x)if(!v|*i++-x)return!1;
#define D(x)for(v=0;x(*i);++i)++v;
v;e(char*i){D(B)C(64)D(isalpha)C(46)D(isalpha)return!*i&v>1;}

Not re-entrant, but can be run multiple times. Test bed:

#include<stdio.h>
#include<assert.h>
main(){
    assert(e("[email protected]"));
    assert(e("[email protected]"));
    assert(e("[email protected]"));
    assert(!e("b@[email protected]"));
    assert(!e("test@%.org"));
    assert(!e("[email protected]"));
    assert(!e("@w.org"));
    assert(!e("test@org"));
    assert(!e("s%[email protected]"));
    assert(!e("foo@a%.com"));
    puts("success!");
}

strager 2009-09-07 18:11:59

Very nice.

jeffamaphone 2009-09-07 18:15:56

@jeffamaphone, Thank you. =]

strager 2009-09-07 18:18:34

+1 agree. Very nice solution.

Alex 2009-09-07 18:20:32

+1, love the nested macros and declaring a global variable without a type!

j_random_hacker 2009-09-07 21:00:51

I just have to ask, did you come up with this 100% by yourself, or did you have any clues from somewhere else? Not suggesting that you weren't capable of coming up with it yourself :) I'm just really amazed by the shortness of your solution.

Alex 2009-09-07 22:08:45

@Alex, Completely self-made. If you expand the macros, it's pretty straight forward. `A` returns `1` if `c` is a letter. `B` returns `1` if `c` is a letter, a digit, or `_` or `.`. `D` iterates to the next character not matching `x` (`A` or `B`), while counting characters. `C` returns from the function with `0` if no characters were iterated over (`!v`) or if the current character is not `x` (`@` or `.`). The final return is `1` if the full string has been parsed and the count is not `1`.

strager 2009-09-07 23:01:25

I just realized there's a bug (`foo@abc.` passes). I'll fix it soon.

strager 2009-09-07 23:02:11

Managed to save a character by fixing the bug. Cool! Also found another optimization, saving yet another character.

strager 2009-09-07 23:03:15

Really this should be C89+ASCII, I'm pretty sure it'd fail on a C89 implementation that used EBCDIC ;)

caf 2009-09-07 23:41:02

@caf - Most C code golfs assume ASCII. I know the "C" locale is generally defined to be ASCII. I almost believe it's in the standard, but I don't know where it would be if it was. Gots to get me a copy of that sometime soon.

Chris Lutz 2009-09-08 00:48:04

P Daddy 2009-09-08 03:40:15

+1 for a really cool solution, by the way. I love the functional nature of this.

P Daddy 2009-09-08 03:43:01

@caf, I assume ASCII in all my code-golf answers. =] On `isalpha`: [this page](http://www.schweikhardt.net/identifiers.html) (seems to be a good reference; bookmarking now...) shows `<ctype.h>` is required for `isalpha`.

strager 2009-09-09 01:17:48

`<ctype.h>` *declares* `isalpha`. That doesn't mean it's *required*. Without the prototype, the compiler will assume it returns an int, which it does, and will not check the number or types of arguments, but that's okay.

P Daddy 2009-09-09 17:05:14

@P Daddy, Oh, you're right... I thought for a second it took char*. I'll update my answer in a bit. Thanks!

strager 2009-09-09 19:42:31

Answer 3

+5 A:

Python (181 characters including newlines)

def v(E):
 import string as t;a=t.ascii_letters;e=a+"1234567890_.";t=e,e,"@",e,".",a,a,a,a,a,"",a
 for c in E:
  if c in t[0]:t=t[2:]
  elif not c in t[1]:return 0>1
 return""==t[0]

Basically just a state machine using obfuscatingly short variable names.

Sean Nyman 2009-09-07 18:21:20

You can drop ~10 characters by making t into a flat list, and incrementing by two. t[s][1] becomes t[s+1]Also, the last return is one space too far.

ACoolie 2009-09-07 19:09:14

@ACoolie: Thanks! It actually appears to put my^H^H*our* solution in the lead so far.

Sean Nyman 2009-09-07 19:16:06

Nevermind, it's only in the lead if I cheat on the count. Oh well.

Sean Nyman 2009-09-07 19:23:03

I golfed it a little further, by reording the list, changing it to a tuple, eliminating spaces, eliminating the list index, etc.

recursive 2009-09-07 20:02:02

It's possible to save two more spaces by changing the indentation inside the loop to tabs.

recursive 2009-09-07 20:03:06

Answer 4

+2 A:

Not the greatest solution no doubt, and pretty darn verbose, but it is valid.

Fixed (All test cases pass now)

    static bool ValidateEmail(string email)
{
    var numbers = "1234567890";
    var uppercase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    var lowercase = uppercase.ToLower();
    var arUppercase = uppercase.ToCharArray();
    var arLowercase = lowercase.ToCharArray();
    var arNumbers = numbers.ToCharArray();
    var atPieces = email.Split(new string[] { "@"}, StringSplitOptions.RemoveEmptyEntries);
    if (atPieces.Length != 2)
        return false;
    foreach (var c in atPieces[0])
    {
        if (!(arNumbers.Contains(c) || arLowercase.Contains(c) || arUppercase.Contains(c) || c == '.' || c == '_'))
            return false;
    }
    if(!atPieces[1].Contains("."))
        return false;
    var dotPieces = atPieces[1].Split('.');
    if (dotPieces.Length != 2)
        return false;
    foreach (var c in dotPieces[0])
    {
        if (!(arLowercase.Contains(c) || arUppercase.Contains(c)))
            return false;
    }
    var found = 0;
    foreach (var c in dotPieces[1])
    {
        if ((arLowercase.Contains(c) || arUppercase.Contains(c)))
            found++;
        else
            return false;
    }
    return found >= 2;
}

Nathan Taylor 2009-09-07 18:33:51

Maybe try to also post a compressed solution (single character variable names, least amount of white space etc.) so you can compete on the character count. Keep the longer one as well though, it's nice to see how you did it! +1

Alex 2009-09-07 18:41:51

Just noticed it fails 2 of the test cases! I'll be back with an update in a sec. :)

Nathan Taylor 2009-09-07 18:47:36

What language is this? Also, you understand that the purpose of code golf is the smallest possible program? :)

recursive 2009-09-08 00:35:51

That would be C#. I didn't realize it was shortest solution, but I just did it out of a desire to see if I could. I added "code-golf" to my preferred tags after seeing this post. :)

Nathan Taylor 2009-09-08 07:10:36

Answer 5

+2 A:

C89 character set agnostic (262 characters)

#include <stdio.h>

/* the 'const ' qualifiers should be removed when */
/* counting characters: I don't like warnings :) */
/* also the 'int ' should not be counted. */

/* it needs only 2 spaces (after the returns), should be only 2 lines */
/* that's a total of 262 characters (1 newline, 2 spaces) */

/* code golf starts here */

#include<string.h>
int v(const char*e){
const char*s="0123456789._abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if(e=strpbrk(e,s))
  if(e=strchr(e+1,'@'))
    if(!strchr(e+1,'@'))
      if(e=strpbrk(e+1,s+12))
        if(e=strchr(e+1,'.'))
          if(!strchr(e+1,'.'))
            if(strlen(e+1)>1)
              return 1;
return 0;
}

/* code golf ends here */

int main(void) {
  const char *t;
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "b@[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "test@%.org"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "@w.org"; printf("%s ==> %d\n", t, v(t));
  t = "test@org"; printf("%s ==> %d\n", t, v(t));
  t = "s%[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "foo@a%.com"; printf("%s ==> %d\n", t, v(t));

  return 0;
}

Version 2

Still C89 character set agnostic, bugs hopefully corrected (303 chars; 284 without the #include)

#include<string.h>
#define Y strchr
#define X{while(Y
v(char*e){char*s="0123456789_.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if(*e!='@')X(s,*e))e++;if(*e++=='@'&&!Y(e,'@')&&Y(e+1,'.'))X(s+12,*e))e++;if(*e++=='.'
&&!Y(e,'.')&&strlen(e)>1){while(*e&&Y(s+12,*e++));if(!*e)return 1;}}}return 0;}

That #define X is absolutely disgusting!

Test as for my first (buggy) version.

pmg 2009-09-07 19:11:58

strager 2009-09-07 20:02:03

Seems we came up with the same idea of using suffixes of a single string as arguments to str...() functions... And actually I noticed a bug in my code after seeing yours!

j_random_hacker 2009-09-07 21:11:34

Answer 6

+12 A:

J

:[[/%^(:[[+-/^,&i|:[$[' ']^j+0__:k<3:]]

P Daddy 2009-09-07 20:46:06

REALLY ;) still, +1 for a good comeback.

Alex 2009-09-07 22:00:43

That's about the fifth J program I've seen that started with `:[[` and ended with `:]]` - what gives?

Chris Lutz 2009-09-08 00:43:50

It's extra sad at the beginning, but by the end it gets really happy.

P Daddy 2009-09-08 03:11:35

Answer 7

+6 A:

C89, 175 characters.

#define G &&*((a+=t+1)-1)==
#define H (t=strspn(a,A
t;e(char*a){char A[66]="_.0123456789Aa";short*s=A+12;for(;++s<A+64;)*s=s[-1]+257;return H))G 64&&H+12))G 46&&H+12))>1 G 0;}

I am using the standard library function strspn(), so I feel this answer isn't as "clean" as strager's answer which does without any library functions. (I also stole his idea of declaring a global variable without a type!)

One of the tricks here is that by putting . and _ at the start of the string A, it's possible to include or exclude them easily in a strspn() test: when you want to allow them, use strspn(something, A); when you don't, use strspn(something, A+12). Another is assuming that sizeof (short) == 2 * sizeof (char), and building up the array of valid characters 2 at a time from the "seed" pair Aa. The rest was just looking for a way to force subexpressions to look similar enough that they could be pulled out into #defined macros.

To make this code more "portable" (heh :-P) you can change the array-building code from

char A[66]="_.0123456789Aa";short*s=A+12;for(;++s<A+64;)*s=s[-1]+257;

to

char*A="_.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

for a cost of 5 additional characters.

j_random_hacker 2009-09-07 20:59:54

I think the `#include<string.h>` should be included. Otherwise, it's not portable. (Your `short` thing isn't portable either but at least you provide a cheap alternative.)

strager 2009-09-07 21:26:55

+1 for trying a different angle :)

Alex 2009-09-07 22:01:34

`size_t strspn();` is less characters than the `#include` and will do the job (and also doesn't require a newline).

caf 2009-09-07 23:46:25

@caf - On many platforms (and by "many" I mean "mine"), `size_t` is only defined in `<stddef.h>` but if you said to hell with portability you could _maybe_ get away with letting it be implicitly declared as returning `int` since it's the same size on many (once again, "my") platforms.

Chris Lutz 2009-09-08 00:52:13

@strager: Point taken, but I think that since most of us are assuming ASCII anyway, portability is already out the window. Surely if it compiles (and it does on at least MSVC++9 and Linux gcc 4.1.2), it's OK?

j_random_hacker 2009-09-08 04:52:05

Chris, a very good point.

caf 2009-09-09 00:30:23

Answer 8

+1 A:

Java: 257 chars (not including the 3 end of lines for readability ;-)).

boolean q(char[]s){int a=0,b=0,c=0,d=0,e=0,f=0,g,y=-99;for(int i:s)
d=(g="@._0123456789QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm".indexOf(i))<0?
y:g<1&&++e>0&(b<1|++a>1)?y:g==1&e>0&(c<1||f++>0)?y:++b>0&g>12?f>0?d+1:f<1&e>0&&++c>0?
d:d:d;return d>1;}

Passes all the tests (my older version was incorrect).

JRL 2009-09-07 21:27:16

Answer 9

+1 A:

VBA/VB6 - 484 chars

Explicit off
usage: VE("[email protected]")

Function V(S, C)
V = True
For I = 1 To Len(S)
 If InStr(C, Mid(S, I, 1)) = 0 Then
  V = False: Exit For
 End If
Next
End Function

Function VE(E)
VE = False
C1 = "abcdefghijklmnopqrstuvwxyzABCDEFGHILKLMNOPQRSTUVWXYZ"
C2 = "0123456789._"
P = Split(E, "@")
If UBound(P) <> 1 Then GoTo X
If Len(P(0)) < 1 Or Not V(P(0), C1 & C2) Then GoTo X
E = P(1): P = Split(E, ".")
If UBound(P) <> 1 Then GoTo X
If Len(P(0)) < 1 Or Not V(P(0), C1) Or Len(P(1)) < 2 Or Not V(P(1), C1) Then GoTo X
VE = True
X:
End Function

DJ 2009-09-07 22:15:29

Answer 10

+5 A:

C (166 characters)

#define F(t,u)for(r=s;t=(*s-64?*s-46?isalpha(*s)?3:isdigit(*s)|*s==95?4:0:2:1);++s);if(s-r-1 u)return 0;
V(char*s){char*r;F(2<,<0)F(1=)F(3=,<0)F(2=)F(3=,<1)return 1;}

The single newline is required, and I've counted it as one character.

P Daddy 2009-09-08 02:16:39

Nice! Calling a macro with fewer arguments than declared is interesting -- I find it compiles (with warnings) on MSVC++ but not on gcc 4.1.2. Any idea what is "officially" allowed in the language spec?

j_random_hacker 2009-09-08 05:04:13

@j_random_hacker: I'm not sure what the spec says, but gcc doesn't like this code at all. Putting commas in those problematic macro calls (`F(1=,)` and `F(2=,)`) fixes the "macro 'F' requires 2 arguments, but only 1 given" error, but my version (3.4.6) still blows up with "syntax error before '=' token" and "syntax error before ')' token".

P Daddy 2009-09-08 07:36:26

Answer 11

+1 A:

Erlang 266 chars:

-module(cg_email).

-export([test/0]).

%%% golf code begin %%%
-define(E,when X>=$a,X=<$z;X>=$A,X=<$Z).
-define(I(Y,Z),Y([X|L])?E->Z(L);Y(_)->false).
-define(L(Y,Z),Y([X|L])?E;X>=$0,X=<$9;X=:=$.;X=:=$_->Z(L);Y(_)->false).
?L(e,m).
m([$@|L])->a(L);?L(m,m).
?I(a,i).
i([$.|L])->l(L);?I(i,i).
?I(l,c).
?I(c,g).
g([])->true;?I(g,g).
%%% golf code end %%%

test() ->
  true  = e("[email protected]"),
  false = e("b@[email protected]"),
  false = e("test@%.org"),
  false = e("[email protected]"),
  true  = e("[email protected]"),
  false = e("test@org"),
  false = e("s%[email protected]"),
  true  = e("[email protected]"),
  false = e("foo@a%.com"),
  ok.

Hynek -Pichi- Vychodil 2009-09-08 13:11:09

Answer 12

+1 A:

Ruby, 225 chars. This is my first Ruby program, so it's probably not very Ruby-like :-)

def v z;r=!a=b=c=d=e=f=0;z.chars{|x|case x when'@';r||=b<1||!e;e=!1 when'.'
e ?b+=1:(a+=1;f=e);r||=a>1||(c<1&&!e)when'0'..'9';b+=1;r|=!e when'A'..'Z','a'..'z'
e ?b+=1:f ?c+=1:d+=1;else r=1 if x!='_'||!e|!b+=1;end};!r&&d>1 end

JRL 2009-09-08 22:32:29

Answer 13

+4 A:

Python, 149 chars (after putting the whole for loop into one semicolon-separated line, which I haven't done here for "readability" purposes):

def v(s,t=0,o=1):
 for c in s:
   k=c=="@"
   p=c=="."
   A=c.isalnum()|p|(c=="_")
   L=c.isalpha()
   o&=[A,k|A,L,L|p,L,L,L][t]
   t+=[1,k,1,p,1,1,0][t]
 return(t>5)&o

Test cases, borrowed from strager's answer:

assert v("[email protected]")
assert v("[email protected]")
assert v("[email protected]")
assert not v("b@[email protected]")
assert not v("test@%.org")
assert not v("[email protected]")
assert not v("@w.org")
assert not v("test@org")
assert not v("s%[email protected]")
assert not v("foo@a%.com")
print "Yeah!"

Explanation: When iterating over the string, two variables keep getting updated.

t keeps the current state:

t = 0: We're at the beginning.
t = 1: We where at the beginning and have found at least one legal character (letter, number, underscore, period)
t = 2: We have found the "@"
t = 3: We have found at least on legal character (i.e. letter) after the "@"
t = 4: We have found the period in the domain name
t = 5: We have found one legal character (letter) after the period
t = 6: We have found at least two legal characters after the period

o as in "okay" starts as 1, i.e. true, and is set to 0 as soon as a character is found that is illegal in the current state. Legal characters are:

In state 0: letter, number, underscore, period (change state to 1 in any case)
In state 1: letter, number, underscore, period, at-sign (change state to 2 if "@" is found)
In state 2: letter (change state to 3)
In state 3: letter, period (change state to 4 if period found)
In states 4 thru 6: letter (increment state when in 4 or 5)

When we have gone all the way through the string, we return whether t==6 (t>5 is one char less) and o is 1.

balpha 2009-09-10 19:54:55

Quite a bit shorter than the other Python solution here! +1.

j_random_hacker 2009-09-20 14:50:06

Answer 14

+1 A:

'Using no regex': PHP 47 Chars.

<?=filter_var($argv[1],FILTER_VALIDATE_EMAIL);

CodeJoust 2009-10-26 18:53:24

Answer 15

+1 A:

Haskell (GHC 6.8.2), 165 161 144C Characters

Using pattern matching, elem, span and all:

a=['A'..'Z']++['a'..'z']
e=f.span(`elem`"._0123456789"++a)
f(_:_,'@':d)=g$span(`elem`a)d
f _=False
g(_:_,'.':t@(_:_:_))=all(`elem`a)t
g _=False

The above was tested with the following code:

main :: IO ()
main = print $ and [
  e "[email protected]",
  e "[email protected]",
  e "[email protected]",
  not $ e "b@[email protected]",
  not $ e "test@%.org",
  not $ e "[email protected]",
  not $ e "@w.org",
  not $ e "test@org",
  not $ e "s%[email protected]",
  not $ e "foo@a%.com"
  ]

Stephan202 2009-10-26 19:48:28

ansaurus

tags:

views:

answers:

Code Golf: Email Address Validation without Regular Expressions

C89 (166 characters)

C89 character set agnostic (262 characters)

Version 2

J

C89, 175 characters.

C (166 characters)

Haskell (GHC 6.8.2), 165 161 144C Characters

related questions