tags:

views:

3064

answers:

5

I'm building a website and one of the requirements is for users to export their contacts from their email client to import them into the site.

Because each email client exports their contacts in a slightly different format this has got my head scratching has to the best way to approach it. As I don't know what the fields are, or what the delimiter is.

I'm only looking to target the main email client/address books (outlook, apple mail, entourage, thunderbird). All of these have an entirely different format. Entourage uses tab as a delimiter where as the rest use a comma etc. I only need to pluck out the email address and (if available) a name. The name gets trickier as some clients have separate fields for first name / last name.

Using FileHelpers would be ideal, but it seems I need to know the structure of the csv before I can hook up a solution. I'd rather not go writing my own csv parser if possible.

Here's my thoughts for the collective hive mind:

Plan A

  • Read the first line of the csv file (all of the formats have a heading as the first row) and count the number of tabs vs commas. Determine from this the delimiter.
  • Use some type of csv reader such as Lumenworks to give me basic csv reading capabilities for the rest of the file.
  • Perform a Regex match on each field to determine the email column.
  • No idea on how to figure out the name of the user...

Plan B

  • Prompt user for type of email client, and individually code it up for each different client <- seems really clunky.

Plan C

....Use / purchase an existing component that already does this?! (I sure can't find one!!)

Thoughts?

+6  A: 

I would go with Plan B (and I disagree that it is clunky).

IMHO, the best way would be to ask the user what kind of email client he/she needs to export from. Accordingly, you can identify the separator character. You yourself have found that although different clients use different separators, a single client will always use the same separator (unless they decide to bring out a non-backward compatible version) Consequently, tt should not be difficult to create an object-oriented class that accepts the separator as a parameter and accordingly parses input (the logic should remain almost the same, irrespective of the separator).

Even if the logic in parsing each type of export file differs significantly, it seems to be that you could create an abstract base class that holds all the common functionality and derived classes that simply override the client-specific functionality.

Even if you use a custom library such as FileHelpers, you should be able to accomplish it by passing the type of separator.

I feel that you should not rely on the relative count of the possible separators to identify what the actual separator is (as in Plan A).

Edit: Another option that just came to mind would be to provide a sort of options interface like MS Excel does. You get to choose the separator character with a live preview of how data will be parsed according to the choice.

Cerebrus
+1 for plan B not being clunky. In fact, it is the only failsafe way to do this. The number of ways that plan A can fail increases exponentially with the number of file formats used.
Treb
A: 

I would first look at how the competition does it.

Google: "We support importing contacts in the CSV file format (Comma Separated Values). For best results, please use a CSV file produced by Outlook, Outlook Express, Yahoo!, or Hotmail. For Apple Address Book, there is a useful utility called "A to G"."
So I guess they go for your plan A, and have checks in place for the above stated CSV files.

Live mail/hotmail: They go for your option B, and support: Microsoft Outlook (using CSV), Outlook Express (using CSV), Windows Contacts, Windows Live Hotmail, Yahoo! Mail (using Outlook CSV format and comma separated), Gmail (using Outlook CSV format)

Facebook: They let you type in your email address, and if they know it (yahoo, gmail, hotmail etc) they will ask you for your password, and retrieve your contacts automatically. (option D) If they don't support your email provider you can still upload a CSV file from either Outlook or other formats (kind of your option B).

I guess the facebook version is really cool. But if that is too much you can go for option A for supported CSV formats (you have to check the different CSV files), and otherwise if you don't recognize it, prompt the user for meaning of the different columsn you recognized.

Gidon
Google follows plan B, they just have an automated way to determine which file format you are using.
Treb
I dislike the Facebook method - it encourages very poor password security dsicipline. "Oh, this third party website wants me to give them my email account details - that's completely fine" - seems like training users that they can give out this data anywhere.
ZombieSheep
@zombiesheep, you are right about that. It would be better if it would actually send you to gmail and there enter your password, and return to facebook with the results (like facebook itself does when third parties want to enter its API)
Gidon
A: 

It might make sense to create an interface like "IContactImporter" which has a method "Import(File/whatever ...)". Then for each type of contact file, create classes that implement the import method to handle each format.

If there is some way to tell which type of file the user is uploading, you may not need to ask the user.

For the actual implementations, I would find an existing CSV library, and configure it accordingly for each format. Someone at my work used LINQtoCSV, but I'm not sure if there are better options.

Andy White
A: 

Plan B would be best, another way would be to look at the whole file and count occurrences of a character this can be done line by line with the streamreader class, then you can split the resulting string into an array.

youll need to restrict the characters to not alpha numeric A-z 0-9 " and look at the char

then you can determine the delimiter. also be aware that if a field is null some programs dont export the "cell", ms office 2007 for instance

Jim
A: 

Plan A seems sensible. I wouldn't think that there would be too many field names (if any) with commas or tabs. So the statistic would be accurate 90% of the time. If the statistic is 'close' enough (e.g. 15 commas and 12 tabs) what you could do is:

int i = line.IndexOf("email", StringCompareOptions.CultureInvariantIgnoreCase);
if(i == -1) i = line.IndexOf("e-mail", StringCompareOptions.CultureInvariantIgnoreCase);
else i += 5; // Length of "email"
if(i == -1) throw new Exception("You should select the email field when exporting.");
else i += 6; // Length of "e-mail"

// Find the next delimeter.
string delim = null;
for(int k = i; k < line.Count; k++)
{
    char c = line[k];
    if(c == '\t' || c == ',')
    {
       delim = c.ToString();
       break;
    }
}

if(delim == null)
   throw new Exception("Unrecognised file format.");

On top of that you said that there would be problem with the first name and last name fields - as well as things like email and e-mail. You would need a pretty good design pattern here. In the true interests of normalized data I would store the first name and last name (and combine them in the UI). Thus:

interface IField
{
    string[] Accepts { get; } // Gets the fields that this can accept.
    string[] Gives { get; } // Gets the field that this would give.

    IEnumerable<KeyValuePair<string, string>> Handle(IEnumerable<KeyValuePair<string, string>> fields);
}

class NameField
{
    string[] Accepts { get return new string[] { "FirstName", "LastName", "Name", "First Name", etc. }; }
    string[] Gives { get return new string[] { "FirstName", "LastName" }; }

    IEnumerable<KeyValuePair<string, string>> Handle(IEnumerable<KeyValuePair<string, string>> fields)
    {
       string firstName = null, lastName = null;
       foreach(KeyValuePair<string, string> field in fields)
       {
           switch(field.Key)
           {
                  case "FirstName":
                  case "First Name":
                  firstName = field.Value;
                  break;
                  // ...
                  case "FullName":
                  case "Full Name":
                  // Split into fn and ln.
                  break;
                  // ...
           }
       }
       yield return new KeyValuePair<string, string>("FirstName", firstName);
       yield return new KeyValuePair<string, string>("LastName", lastName);
    }
}

In any case, I am sure you get the idea. A bunch of transforms that will turn fields into recognized ones.

Jonathan C Dickinson