Bounced email parsing

views:

223

answers:

+3 Q:

Bounced email parsing

I'm currently having a mess about with catching, parsing and sorting bounced emails. I have the basics set up nicely and it does what I want, which is nice... problem being is that there seems to be no standard to the messages returned in the bounced email.

For example, some servers return the error code as specified by RFC 1893 and I can nine times out of ten pick that up via a simple regex thing. But sometimes servers just respond saying that the email has bounced, with either no reason given or a reason worded entirely different to any standards.

So I guess my question is, has anyone got any solution to this? I don't want to be searching for a billion and one possible strings in the email returned to be honest. Yet it would be nice to not have to resort to 'reason unknown' or something similar.

Has anyone else had any luck with this or ideas? Cheers

+1 A:

You could set up system lets an operator review messages, select strings, and then categorize from there. Eventually, you could hope to get that 1 in 10 down to 1 in 100 or 1 in 1,000. There are always going to be more and more corner cases here however.

Kyle Hodgson 2009-11-24 03:30:16

Yeah, Im considering this option as the best, though weather the end user will be able to determine what type the mail is is another thing.

rich 2009-11-24 09:11:24

+1 A:

Also not a definitive answer, but in a similar spirit to Kyle's response, you could use a bayes/token based spam filter to "learn" about bounce messages and then automatically route them to whatever you want to handle the bounced mail.

In other words, you have an account where you train spamassassin or spamprobe or whatever that a bunch of different bounce messages (and only bounce messages) are "junk", then let that spam system be a second line of filtering after whatever you've developed.

So, let's say your solution, the first filter, finds 90% of bounced messages. You have your system do whatever it normally does with bounces, then save them to a bounce-messages mailbox, which is periodically scanned by spamassasin/spamprobe to learn those messages as "junk".

You also then have spamassassin or spamprobe or whatever as a second filter (run on anything yours doesn't flag as a bounce) do its own estimation of bounced-ness, and whatever it considers "junk" (because you've trained to to think bounce = junk), you also route to your program etc.

Still requires a little bit of manual review, but in theory it should get better and better over time as you rely on the spam system's learning to account for the edge cases.

Chirael 2009-11-24 04:17:18

ansaurus

tags:

views:

answers:

Bounced email parsing

related questions