views:

158

answers:

2

I'm writing a regular expression that can interactively validate SMTP responses codes, once the SMTP dialog is completed it should pass the following regex (some parentheses added for better readability):

^(220)(250){3,}(354)(250)(221)$

Or with(out) authentication:

^(220)(250)((334){2}(235))?(250){2,}(354)(250)(221)$

I'm trying to do rewrite the above regexes so that I can interactively check if the dialog is going as expected, otherwise politely send a QUIT command and close the connection saving bandwidth and time, but I'm having a hard time writing an optimal regex. So far I've managed to come up with:

^(220(250(334(235(250(354(250(221)?)?)?){0,})?){0,2})?)?$

Which, besides only matching authenticated connections, has some bugs... For instance, it matches:

220250334235250354250221
220250334334235250354250221

I've also tried the following modification:

^(220(250)?)?((334(235)?){2})?(250(354(250(221)?)?)?){0,}$

This one accepts non-authenticated responses but it fails to match 220250334 and wrongly matches 220250334334235250354250221 (at least 2 250 are needed before the 354 response code).

Can someone help me out with this? Thanks in advance.


An example of what I'm trying to do:

$smtp = fsockopen('mail.example.com', 25);
$result = null;
$commands = array('HELO', 'AUTH LOGIN', 'user', 'pass', 'MAIL FROM', 'RCPT TO', 'RCPT TO', 'DATA', "\r\n.", 'QUIT');

foreach ($commands as $command)
{
    $result .= substr(fgets($smtp), 0, 3);

    if (preg_match('~^(220(250)?)?((334){1,2}(235)?)?(250(354(250(221)?)?)?){0,}$~S', $result) > 0)
    {
        fwrite($smtp, $command . "\r\n");
    }

    else
    {
        fwrite($smtp, "QUIT\r\n");
        fclose($smtp);
        break;
    }
}

Which should act as a replacement for the following procedural code:

$smtp = fsockopen('mail.example.com', 25);
$result = substr(fgets($smtp), 0, 3); // 220

if ($result == '220')
{
    fwrite($smtp, 'HELO' . "\r\n");
    $result = substr(fgets($smtp), 0, 3); // 220

    if ($result == '250')
    {
        fwrite($smtp, 'AUTH LOGIN' . "\r\n");
        $result = substr(fgets($smtp), 0, 3); // 334

        if ($result == '334')
        {
            fwrite($smtp, 'user' . "\r\n");
            $result = substr(fgets($smtp), 0, 3); // 334

            if ($result == '334')
            {
                fwrite($smtp, 'pass' . "\r\n");
                $result = substr(fgets($smtp), 0, 3); // 235

                if ($result == '235')
                {
                    fwrite($smtp, 'MAIL FROM' . "\r\n");
                    $result = substr(fgets($smtp), 0, 3); // 250

                    if ($result == '250')
                    {
                        foreach ($to as $mail)
                        {
                            fwrite($smtp, 'RCPT TO' . "\r\n");
                            $result = substr(fgets($smtp), 0, 3); // 250

                            if ($result != '250')
                            {
                                fwrite($smtp, 'QUIT' . "\r\n");
                                $result = substr(fgets($smtp), 0, 3); // 221
                                fclose($smtp);

                                break;
                            }
                        }

                        if ($result == '250')
                        {
                            fwrite($smtp, 'DATA' . "\r\n");
                            $result = substr(fgets($smtp), 0, 3); // 354

                            if ($result == '354')
                            {
                                fwrite($smtp, "\r\n.\r\n");
                                $result = substr(fgets($smtp), 0, 3); // 250

                                if ($result == '250')
                                {
                                    fwrite($smtp, 'QUIT' . "\r\n");
                                    $result = substr(fgets($smtp), 0, 3); // 221
                                    fclose($smtp);

                                    if ($result == '221')
                                    {
                                        echo 'SUCESS!';
                                    }
                                }

                                else
                                {
                                    fwrite($smtp, 'QUIT' . "\r\n");
                                    $result = substr(fgets($smtp), 0, 3); // 221
                                    fclose($smtp);
                                }
                            }

                            else
                            {
                                fwrite($smtp, 'QUIT' . "\r\n");
                                $result = substr(fgets($smtp), 0, 3); // 221
                                fclose($smtp);
                            }
                        }
                    }

                    else
                    {
                        fwrite($smtp, 'QUIT' . "\r\n");
                        $result = substr(fgets($smtp), 0, 3); // 221
                        fclose($smtp);
                    }
                }

                else
                {
                    fwrite($smtp, 'QUIT' . "\r\n");
                    $result = substr(fgets($smtp), 0, 3); // 221
                    fclose($smtp);
                }
            }

            else
            {
                fwrite($smtp, 'QUIT' . "\r\n");
                $result = substr(fgets($smtp), 0, 3); // 221
                fclose($smtp);
            }
        }

        else
        {
            fwrite($smtp, 'QUIT' . "\r\n");
            $result = substr(fgets($smtp), 0, 3); // 221
            fclose($smtp);
        }
    }

    else
    {
        fwrite($smtp, 'QUIT' . "\r\n");
        $result = substr(fgets($smtp), 0, 3); // 221
        fclose($smtp);
    }
}

else
{
    fwrite($smtp, 'QUIT' . "\r\n");
    $result = substr(fgets($smtp), 0, 3); // 221
    fclose($smtp);
}
+5  A: 

I presume you're building a string with all the response codes you receive, stripping out the rest of the message?

This is probably not the answer you want, but I can't help but get the feeling that regex is just not the right tool here. Regular expressions are good at parsing text into tokens or extracting interesting sub-strings out of a larger string. But you already have tokens (SMTP response codes) and you're trying to ensure that they arrive in the expected order. I'd just add the response codes to a queue and after every addition check whether the start of the queue matches one of the expected pattern for the state that you're in. If it does, remove that part from the queue and go to the next state. There are only a few states, so I'd just write code specific to those, rather than try to abstract it into some kind of a state machine.

If you do go the Regex way you might want to keep space in the string as separators - it would not only make it easier to match codes, but easier to read the program as well.

Edit: Thanks for posting the code. It's pretty much what I assumed. You're basically trying to create an abstract solution to this problem, so you have the ability to send an a given array of commands and expect back a given pattern of responses. You really don't need to make it abstract - the added complexity is huge and unlikely to pay off in re-use. Just write the code that says: send X, if you receive Y continue, otherwise QUIT. It will be so much easier and more readable.

Evgeny
http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html
Amber
@Evgeny: I've updated my answer with an example. Adding the response codes to a queue and checking if they match is exactly what I want to do, adding more state-specific regular expressions would make the code longer and more complex and IMHO. Regexes are great when it comes to matching structured data, and an SMTP dialog follows a very clear structure from what you can see in the first two regexes, this approach seems more elegant to me than a bunch of `if`'s and `for`'s.
Alix Axel
@Evgeny: I've added a procedural implementation of the approach you suggested to my question, however, I don't think it's easier or more readable than the regex approach.
Alix Axel
Well of course it's not very readable when you copy/paste the same thing all over. You still need to think about how to structure it.
Evgeny
A: 

It's amazing how regular expressions become so much easier after a good night of sleep, here it is:

(?>220(?>250(?>(?>334){1,2}(?>235)?)?(?>(?>250){1,}(?>354(?>250(?>221)?)?)?)?)?)?

Which can be simplified to this:

^220(?>250(?>(?>334){1,2}(?>235)?)?(?>(?>250){1,}(?>354(?>250)?)?)?)?$

Since the first response code (220) is not optional and we will always send the last QUIT command.

Alix Axel