views:

37

answers:

3

Im currently developing an app that can parse dates from an email - i.e extract the time and dates from an email (similar to gmail).

Currently I do this in php but this is a tad clunky.

Whats the best language to do this in and are there any existing open source solutions?

A: 

I think PHP is as capable as any other language. Can we see the code you're using so we can suggest improvements? I'd use a regular expression... you just need a good one that supports a variety of formats.

Mark
A: 

What I do in my email client is extract all the tokens delimited by whitespace and then iterate over them using heuristics to decide how to classify each token. For instance if the token has a ':' character in it then I treat it as a time, to be parsed as ##:##:##. If it has '.' or '-' treat it as a day/month/year combo, and you have to decide which end is which... could be any number of combinations. If the token starts with a letter (i.e. isalpha(*string)) then you do a month name lookup. If it's a number it could be the day or year... decide based on length and whether you have an existing day or year already etc. If the token starts with '-' or '+' then it's a timezone, parse accordingly.

Seems to work in the field quite well, my email client has been around for 10 years or so. My code is C++, but you can write the same in PHP easily, it's not particularly language specific.

fret
A: 

if you mean the date it was sent (or received), you are retrieving them from the mail headers (for example 'Date:' header) and they have a standard date format, see the RFC 2822

Anyway, if you use javamail (it's open source now) you can get the sent date with

Date sentDate = mail.getSentDate();
raticulin