views:

1452

answers:

15

(Edit: What is Code Golf: Code Golf are challenges to solve a specific problem with the shortest amount of code by character count in whichever language you prefer. More info here on Meta StackOverflow. )

Code Golfers, here's a challenge on string operations.

Email Address Validation, but without regular expressions (or similar parsing library) of course. It's not so much about the email addresses but how short you can write the different string operations and constraints given below.

The rules are the following (yes, I know, this is not RFC compliant, but these are going to be the 5 rules for this challenge):

  • At least 1 character out of this group before the @:

    A-Z, a-z, 0-9, . (period), _ (underscore)
    
  • @ has to exist, exactly one time

    [email protected]
        ^
    
  • Period (.) has to exist exactly one time after the @

    [email protected]
              ^
    
  • At least 1 only [A-Z, a-z] character between @ and the following . (period)

    [email protected]
         ^
    
  • At least 2 only [A-Z, a-z] characters after the final . period

    [email protected]
               ^^
    

Please post the method/function only, which would take a string (proposed email address) and then return a Boolean result (true/false) depending on the email address being valid (true) or invalid (false).

Samples:
[email protected]    (valid/true)          @w.org     (invalid/false)    
b@[email protected]  (invalid/false)       test@org   (invalid/false)    
test@%.org (invalid/false)       s%[email protected]  (invalid/false)    
[email protected] (invalid/false)       [email protected]  (valid/true)
[email protected]  (valid/true)          foo@a%.com (invalid/false)

Good luck!

+2  A: 

Whatever version of C++ MSVC2008 supports.

Here's my humble submission. Now I know why they told me never to do the things I did in here:

#define N return 0
#define I(x) &&*x!='.'&&*x!='_'
bool p(char*a) {
 if(!isalnum(a[0])I(a))N;
 char*p=a,*b=0,*c=0;
 for(int d=0,e=0;*p;p++){
  if(*p=='@'){d++;b=p;}
  else if(*p=='.'){if(d){e++;c=p;}}
  else if(!isalnum(*p)I(p))N;
  if (d>1||e>1)N;
 }
 if(b>c||b+1>=c||c+2>=p)N;
 return 1;
}
jeffamaphone
Assumes a is properly NULL-terminated. <shrug/>
jeffamaphone
It's nice to provide a character count in your answers, as well as the language used.
strager
+19  A: 

C89 (166 characters)

#define B(c)isalnum(c)|c==46|c==95
#define C(x)if(!v|*i++-x)return!1;
#define D(x)for(v=0;x(*i);++i)++v;
v;e(char*i){D(B)C(64)D(isalpha)C(46)D(isalpha)return!*i&v>1;}

Not re-entrant, but can be run multiple times. Test bed:

#include<stdio.h>
#include<assert.h>
main(){
    assert(e("[email protected]"));
    assert(e("[email protected]"));
    assert(e("[email protected]"));
    assert(!e("b@[email protected]"));
    assert(!e("test@%.org"));
    assert(!e("[email protected]"));
    assert(!e("@w.org"));
    assert(!e("test@org"));
    assert(!e("s%[email protected]"));
    assert(!e("foo@a%.com"));
    puts("success!");
}
strager
Very nice.
jeffamaphone
@jeffamaphone, Thank you. =]
strager
+1 agree. Very nice solution.
Alex
+1, love the nested macros and declaring a global variable without a type!
j_random_hacker
I just have to ask, did you come up with this 100% by yourself, or did you have any clues from somewhere else? Not suggesting that you weren't capable of coming up with it yourself :) I'm just really amazed by the shortness of your solution.
Alex
@Alex, Completely self-made. If you expand the macros, it's pretty straight forward. `A` returns `1` if `c` is a letter. `B` returns `1` if `c` is a letter, a digit, or `_` or `.`. `D` iterates to the next character not matching `x` (`A` or `B`), while counting characters. `C` returns from the function with `0` if no characters were iterated over (`!v`) or if the current character is not `x` (`@` or `.`). The final return is `1` if the full string has been parsed and the count is not `1`.
strager
I just realized there's a bug (`foo@abc.` passes). I'll fix it soon.
strager
Managed to save a character by fixing the bug. Cool! Also found another optimization, saving yet another character.
strager
Really this should be C89+ASCII, I'm pretty sure it'd fail on a C89 implementation that used EBCDIC ;)
caf
@caf - Most C code golfs assume ASCII. I know the "C" locale is generally defined to be ASCII. I almost believe it's in the standard, but I don't know where it would be if it was. Gots to get me a copy of that sometime soon.
Chris Lutz
P Daddy
+1 for a really cool solution, by the way. I love the functional nature of this.
P Daddy
@caf, I assume ASCII in all my code-golf answers. =] On `isalpha`: [this page](http://www.schweikhardt.net/identifiers.html) (seems to be a good reference; bookmarking now...) shows `<ctype.h>` is required for `isalpha`.
strager
`<ctype.h>` *declares* `isalpha`. That doesn't mean it's *required*. Without the prototype, the compiler will assume it returns an int, which it does, and will not check the number or types of arguments, but that's okay.
P Daddy
@P Daddy, Oh, you're right... I thought for a second it took char*. I'll update my answer in a bit. Thanks!
strager
+5  A: 

Python (181 characters including newlines)

def v(E):
 import string as t;a=t.ascii_letters;e=a+"1234567890_.";t=e,e,"@",e,".",a,a,a,a,a,"",a
 for c in E:
  if c in t[0]:t=t[2:]
  elif not c in t[1]:return 0>1
 return""==t[0]

Basically just a state machine using obfuscatingly short variable names.

Sean Nyman
You can drop ~10 characters by making t into a flat list, and incrementing by two. t[s][1] becomes t[s+1]Also, the last return is one space too far.
ACoolie
@ACoolie: Thanks! It actually appears to put my^H^H*our* solution in the lead so far.
Sean Nyman
Nevermind, it's only in the lead if I cheat on the count. Oh well.
Sean Nyman
I golfed it a little further, by reording the list, changing it to a tuple, eliminating spaces, eliminating the list index, etc.
recursive
It's possible to save two more spaces by changing the indentation inside the loop to tabs.
recursive
+2  A: 

Not the greatest solution no doubt, and pretty darn verbose, but it is valid.

Fixed (All test cases pass now)

    static bool ValidateEmail(string email)
{
    var numbers = "1234567890";
    var uppercase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    var lowercase = uppercase.ToLower();
    var arUppercase = uppercase.ToCharArray();
    var arLowercase = lowercase.ToCharArray();
    var arNumbers = numbers.ToCharArray();
    var atPieces = email.Split(new string[] { "@"}, StringSplitOptions.RemoveEmptyEntries);
    if (atPieces.Length != 2)
        return false;
    foreach (var c in atPieces[0])
    {
        if (!(arNumbers.Contains(c) || arLowercase.Contains(c) || arUppercase.Contains(c) || c == '.' || c == '_'))
            return false;
    }
    if(!atPieces[1].Contains("."))
        return false;
    var dotPieces = atPieces[1].Split('.');
    if (dotPieces.Length != 2)
        return false;
    foreach (var c in dotPieces[0])
    {
        if (!(arLowercase.Contains(c) || arUppercase.Contains(c)))
            return false;
    }
    var found = 0;
    foreach (var c in dotPieces[1])
    {
        if ((arLowercase.Contains(c) || arUppercase.Contains(c)))
            found++;
        else
            return false;
    }
    return found >= 2;
}
Nathan Taylor
Maybe try to also post a compressed solution (single character variable names, least amount of white space etc.) so you can compete on the character count. Keep the longer one as well though, it's nice to see how you did it! +1
Alex
Just noticed it fails 2 of the test cases! I'll be back with an update in a sec. :)
Nathan Taylor
What language is this? Also, you understand that the purpose of code golf is the smallest possible program? :)
recursive
That would be C#. I didn't realize it was shortest solution, but I just did it out of a desire to see if I could. I added "code-golf" to my preferred tags after seeing this post. :)
Nathan Taylor
+2  A: 

C89 character set agnostic (262 characters)

#include <stdio.h>

/* the 'const ' qualifiers should be removed when */
/* counting characters: I don't like warnings :) */
/* also the 'int ' should not be counted. */

/* it needs only 2 spaces (after the returns), should be only 2 lines */
/* that's a total of 262 characters (1 newline, 2 spaces) */

/* code golf starts here */

#include<string.h>
int v(const char*e){
const char*s="0123456789._abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if(e=strpbrk(e,s))
  if(e=strchr(e+1,'@'))
    if(!strchr(e+1,'@'))
      if(e=strpbrk(e+1,s+12))
        if(e=strchr(e+1,'.'))
          if(!strchr(e+1,'.'))
            if(strlen(e+1)>1)
              return 1;
return 0;
}

/* code golf ends here */

int main(void) {
  const char *t;
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "b@[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "test@%.org"; printf("%s ==> %d\n", t, v(t));
  t = "[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "@w.org"; printf("%s ==> %d\n", t, v(t));
  t = "test@org"; printf("%s ==> %d\n", t, v(t));
  t = "s%[email protected]"; printf("%s ==> %d\n", t, v(t));
  t = "foo@a%.com"; printf("%s ==> %d\n", t, v(t));

  return 0;
}

Version 2

Still C89 character set agnostic, bugs hopefully corrected (303 chars; 284 without the #include)

#include<string.h>
#define Y strchr
#define X{while(Y
v(char*e){char*s="0123456789_.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
if(*e!='@')X(s,*e))e++;if(*e++=='@'&&!Y(e,'@')&&Y(e+1,'.'))X(s+12,*e))e++;if(*e++=='.'
&&!Y(e,'.')&&strlen(e)>1){while(*e&&Y(s+12,*e++));if(!*e)return 1;}}}return 0;}

That #define X is absolutely disgusting!

Test as for my first (buggy) version.

pmg
strager
Seems we came up with the same idea of using suffixes of a single string as arguments to str...() functions... And actually I noticed a bug in my code after seeing yours!
j_random_hacker
+12  A: 

J

:[[/%^(:[[+-/^,&i|:[$[' ']^j+0__:k<3:]]
P Daddy
REALLY ;) still, +1 for a good comeback.
Alex
That's about the fifth J program I've seen that started with `:[[` and ended with `:]]` - what gives?
Chris Lutz
It's extra sad at the beginning, but by the end it gets really happy.
P Daddy
+6  A: 

C89, 175 characters.

#define G &&*((a+=t+1)-1)==
#define H (t=strspn(a,A
t;e(char*a){char A[66]="_.0123456789Aa";short*s=A+12;for(;++s<A+64;)*s=s[-1]+257;return H))G 64&&H+12))G 46&&H+12))>1 G 0;}

I am using the standard library function strspn(), so I feel this answer isn't as "clean" as strager's answer which does without any library functions. (I also stole his idea of declaring a global variable without a type!)

One of the tricks here is that by putting . and _ at the start of the string A, it's possible to include or exclude them easily in a strspn() test: when you want to allow them, use strspn(something, A); when you don't, use strspn(something, A+12). Another is assuming that sizeof (short) == 2 * sizeof (char), and building up the array of valid characters 2 at a time from the "seed" pair Aa. The rest was just looking for a way to force subexpressions to look similar enough that they could be pulled out into #defined macros.

To make this code more "portable" (heh :-P) you can change the array-building code from

char A[66]="_.0123456789Aa";short*s=A+12;for(;++s<A+64;)*s=s[-1]+257;

to

char*A="_.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

for a cost of 5 additional characters.

j_random_hacker
I think the `#include<string.h>` should be included. Otherwise, it's not portable. (Your `short` thing isn't portable either but at least you provide a cheap alternative.)
strager
+1 for trying a different angle :)
Alex
`size_t strspn();` is less characters than the `#include` and will do the job (and also doesn't require a newline).
caf
@caf - On many platforms (and by "many" I mean "mine"), `size_t` is only defined in `<stddef.h>` but if you said to hell with portability you could _maybe_ get away with letting it be implicitly declared as returning `int` since it's the same size on many (once again, "my") platforms.
Chris Lutz
@strager: Point taken, but I think that since most of us are assuming ASCII anyway, portability is already out the window. Surely if it compiles (and it does on at least MSVC++9 and Linux gcc 4.1.2), it's OK?
j_random_hacker
Chris, a very good point.
caf
+1  A: 

Java: 257 chars (not including the 3 end of lines for readability ;-)).

boolean q(char[]s){int a=0,b=0,c=0,d=0,e=0,f=0,g,y=-99;for(int i:s)
d=(g="@._0123456789QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm".indexOf(i))<0?
y:g<1&&++e>0&(b<1|++a>1)?y:g==1&e>0&(c<1||f++>0)?y:++b>0&g>12?f>0?d+1:f<1&e>0&&++c>0?
d:d:d;return d>1;}

Passes all the tests (my older version was incorrect).

JRL
+1  A: 

VBA/VB6 - 484 chars

Explicit off
usage: VE("[email protected]")

Function V(S, C)
V = True
For I = 1 To Len(S)
 If InStr(C, Mid(S, I, 1)) = 0 Then
  V = False: Exit For
 End If
Next
End Function

Function VE(E)
VE = False
C1 = "abcdefghijklmnopqrstuvwxyzABCDEFGHILKLMNOPQRSTUVWXYZ"
C2 = "0123456789._"
P = Split(E, "@")
If UBound(P) <> 1 Then GoTo X
If Len(P(0)) < 1 Or Not V(P(0), C1 & C2) Then GoTo X
E = P(1): P = Split(E, ".")
If UBound(P) <> 1 Then GoTo X
If Len(P(0)) < 1 Or Not V(P(0), C1) Or Len(P(1)) < 2 Or Not V(P(1), C1) Then GoTo X
VE = True
X:
End Function
DJ
+5  A: 

C (166 characters)

#define F(t,u)for(r=s;t=(*s-64?*s-46?isalpha(*s)?3:isdigit(*s)|*s==95?4:0:2:1);++s);if(s-r-1 u)return 0;
V(char*s){char*r;F(2<,<0)F(1=)F(3=,<0)F(2=)F(3=,<1)return 1;}

The single newline is required, and I've counted it as one character.

P Daddy
Nice! Calling a macro with fewer arguments than declared is interesting -- I find it compiles (with warnings) on MSVC++ but not on gcc 4.1.2. Any idea what is "officially" allowed in the language spec?
j_random_hacker
@j_random_hacker: I'm not sure what the spec says, but gcc doesn't like this code at all. Putting commas in those problematic macro calls (`F(1=,)` and `F(2=,)`) fixes the "macro 'F' requires 2 arguments, but only 1 given" error, but my version (3.4.6) still blows up with "syntax error before '=' token" and "syntax error before ')' token".
P Daddy
+1  A: 

Erlang 266 chars:

-module(cg_email).

-export([test/0]).

%%% golf code begin %%%
-define(E,when X>=$a,X=<$z;X>=$A,X=<$Z).
-define(I(Y,Z),Y([X|L])?E->Z(L);Y(_)->false).
-define(L(Y,Z),Y([X|L])?E;X>=$0,X=<$9;X=:=$.;X=:=$_->Z(L);Y(_)->false).
?L(e,m).
m([$@|L])->a(L);?L(m,m).
?I(a,i).
i([$.|L])->l(L);?I(i,i).
?I(l,c).
?I(c,g).
g([])->true;?I(g,g).
%%% golf code end %%%

test() ->
  true  = e("[email protected]"),
  false = e("b@[email protected]"),
  false = e("test@%.org"),
  false = e("[email protected]"),
  true  = e("[email protected]"),
  false = e("test@org"),
  false = e("s%[email protected]"),
  true  = e("[email protected]"),
  false = e("foo@a%.com"),
  ok.
Hynek -Pichi- Vychodil
+1  A: 

Ruby, 225 chars. This is my first Ruby program, so it's probably not very Ruby-like :-)

def v z;r=!a=b=c=d=e=f=0;z.chars{|x|case x when'@';r||=b<1||!e;e=!1 when'.'
e ?b+=1:(a+=1;f=e);r||=a>1||(c<1&&!e)when'0'..'9';b+=1;r|=!e when'A'..'Z','a'..'z'
e ?b+=1:f ?c+=1:d+=1;else r=1 if x!='_'||!e|!b+=1;end};!r&&d>1 end
JRL
+4  A: 

Python, 149 chars (after putting the whole for loop into one semicolon-separated line, which I haven't done here for "readability" purposes):

def v(s,t=0,o=1):
 for c in s:
   k=c=="@"
   p=c=="."
   A=c.isalnum()|p|(c=="_")
   L=c.isalpha()
   o&=[A,k|A,L,L|p,L,L,L][t]
   t+=[1,k,1,p,1,1,0][t]
 return(t>5)&o

Test cases, borrowed from strager's answer:

assert v("[email protected]")
assert v("[email protected]")
assert v("[email protected]")
assert not v("b@[email protected]")
assert not v("test@%.org")
assert not v("[email protected]")
assert not v("@w.org")
assert not v("test@org")
assert not v("s%[email protected]")
assert not v("foo@a%.com")
print "Yeah!"

Explanation: When iterating over the string, two variables keep getting updated.

t keeps the current state:

  • t = 0: We're at the beginning.
  • t = 1: We where at the beginning and have found at least one legal character (letter, number, underscore, period)
  • t = 2: We have found the "@"
  • t = 3: We have found at least on legal character (i.e. letter) after the "@"
  • t = 4: We have found the period in the domain name
  • t = 5: We have found one legal character (letter) after the period
  • t = 6: We have found at least two legal characters after the period

o as in "okay" starts as 1, i.e. true, and is set to 0 as soon as a character is found that is illegal in the current state. Legal characters are:

  • In state 0: letter, number, underscore, period (change state to 1 in any case)
  • In state 1: letter, number, underscore, period, at-sign (change state to 2 if "@" is found)
  • In state 2: letter (change state to 3)
  • In state 3: letter, period (change state to 4 if period found)
  • In states 4 thru 6: letter (increment state when in 4 or 5)

When we have gone all the way through the string, we return whether t==6 (t>5 is one char less) and o is 1.

balpha
Quite a bit shorter than the other Python solution here! +1.
j_random_hacker
+1  A: 

'Using no regex': PHP 47 Chars.

<?=filter_var($argv[1],FILTER_VALIDATE_EMAIL);
CodeJoust
+1  A: 

Haskell (GHC 6.8.2), 165 161 144C Characters


Using pattern matching, elem, span and all:

a=['A'..'Z']++['a'..'z']
e=f.span(`elem`"._0123456789"++a)
f(_:_,'@':d)=g$span(`elem`a)d
f _=False
g(_:_,'.':t@(_:_:_))=all(`elem`a)t
g _=False

The above was tested with the following code:

main :: IO ()
main = print $ and [
  e "[email protected]",
  e "[email protected]",
  e "[email protected]",
  not $ e "b@[email protected]",
  not $ e "test@%.org",
  not $ e "[email protected]",
  not $ e "@w.org",
  not $ e "test@org",
  not $ e "s%[email protected]",
  not $ e "foo@a%.com"
  ]
Stephan202