ansaurus

Question

Answer 1

+6 A:

I would go with Plan B (and I disagree that it is clunky).

IMHO, the best way would be to ask the user what kind of email client he/she needs to export from. Accordingly, you can identify the separator character. You yourself have found that although different clients use different separators, a single client will always use the same separator (unless they decide to bring out a non-backward compatible version) Consequently, tt should not be difficult to create an object-oriented class that accepts the separator as a parameter and accordingly parses input (the logic should remain almost the same, irrespective of the separator).

Even if the logic in parsing each type of export file differs significantly, it seems to be that you could create an abstract base class that holds all the common functionality and derived classes that simply override the client-specific functionality.

Even if you use a custom library such as FileHelpers, you should be able to accomplish it by passing the type of separator.

I feel that you should not rely on the relative count of the possible separators to identify what the actual separator is (as in Plan A).

Edit: Another option that just came to mind would be to provide a sort of options interface like MS Excel does. You get to choose the separator character with a live preview of how data will be parsed according to the choice.

Cerebrus 2009-06-02 07:26:18

+1 for plan B not being clunky. In fact, it is the only failsafe way to do this. The number of ways that plan A can fail increases exponentially with the number of file formats used.

Treb 2009-06-02 08:10:08

Answer 2

A:

I would first look at how the competition does it.

Google: "We support importing contacts in the CSV file format (Comma Separated Values). For best results, please use a CSV file produced by Outlook, Outlook Express, Yahoo!, or Hotmail. For Apple Address Book, there is a useful utility called "A to G"."
So I guess they go for your plan A, and have checks in place for the above stated CSV files.

Live mail/hotmail: They go for your option B, and support: Microsoft Outlook (using CSV), Outlook Express (using CSV), Windows Contacts, Windows Live Hotmail, Yahoo! Mail (using Outlook CSV format and comma separated), Gmail (using Outlook CSV format)

Facebook: They let you type in your email address, and if they know it (yahoo, gmail, hotmail etc) they will ask you for your password, and retrieve your contacts automatically. (option D) If they don't support your email provider you can still upload a CSV file from either Outlook or other formats (kind of your option B).

I guess the facebook version is really cool. But if that is too much you can go for option A for supported CSV formats (you have to check the different CSV files), and otherwise if you don't recognize it, prompt the user for meaning of the different columsn you recognized.

Gidon 2009-06-02 07:29:03

Google follows plan B, they just have an automated way to determine which file format you are using.

Treb 2009-06-02 08:11:47

I dislike the Facebook method - it encourages very poor password security dsicipline. "Oh, this third party website wants me to give them my email account details - that's completely fine" - seems like training users that they can give out this data anywhere.

ZombieSheep 2009-06-02 08:13:43

@zombiesheep, you are right about that. It would be better if it would actually send you to gmail and there enter your password, and return to facebook with the results (like facebook itself does when third parties want to enter its API)

Gidon 2009-06-02 08:48:50

Answer 3

A:

It might make sense to create an interface like "IContactImporter" which has a method "Import(File/whatever ...)". Then for each type of contact file, create classes that implement the import method to handle each format.

If there is some way to tell which type of file the user is uploading, you may not need to ask the user.

For the actual implementations, I would find an existing CSV library, and configure it accordingly for each format. Someone at my work used LINQtoCSV, but I'm not sure if there are better options.

Andy White 2009-06-02 07:29:47

Answer 4

A:

Plan B would be best, another way would be to look at the whole file and count occurrences of a character this can be done line by line with the streamreader class, then you can split the resulting string into an array.

youll need to restrict the characters to not alpha numeric A-z 0-9 " and look at the char

then you can determine the delimiter. also be aware that if a field is null some programs dont export the "cell", ms office 2007 for instance

Jim 2009-06-02 07:33:02

Answer 5

A:

Plan A seems sensible. I wouldn't think that there would be too many field names (if any) with commas or tabs. So the statistic would be accurate 90% of the time. If the statistic is 'close' enough (e.g. 15 commas and 12 tabs) what you could do is:

int i = line.IndexOf("email", StringCompareOptions.CultureInvariantIgnoreCase);
if(i == -1) i = line.IndexOf("e-mail", StringCompareOptions.CultureInvariantIgnoreCase);
else i += 5; // Length of "email"
if(i == -1) throw new Exception("You should select the email field when exporting.");
else i += 6; // Length of "e-mail"

// Find the next delimeter.
string delim = null;
for(int k = i; k < line.Count; k++)
{
    char c = line[k];
    if(c == '\t' || c == ',')
    {
       delim = c.ToString();
       break;
    }
}

if(delim == null)
   throw new Exception("Unrecognised file format.");

On top of that you said that there would be problem with the first name and last name fields - as well as things like email and e-mail. You would need a pretty good design pattern here. In the true interests of normalized data I would store the first name and last name (and combine them in the UI). Thus:

interface IField
{
    string[] Accepts { get; } // Gets the fields that this can accept.
    string[] Gives { get; } // Gets the field that this would give.

    IEnumerable<KeyValuePair<string, string>> Handle(IEnumerable<KeyValuePair<string, string>> fields);
}

class NameField
{
    string[] Accepts { get return new string[] { "FirstName", "LastName", "Name", "First Name", etc. }; }
    string[] Gives { get return new string[] { "FirstName", "LastName" }; }

    IEnumerable<KeyValuePair<string, string>> Handle(IEnumerable<KeyValuePair<string, string>> fields)
    {
       string firstName = null, lastName = null;
       foreach(KeyValuePair<string, string> field in fields)
       {
           switch(field.Key)
           {
                  case "FirstName":
                  case "First Name":
                  firstName = field.Value;
                  break;
                  // ...
                  case "FullName":
                  case "Full Name":
                  // Split into fn and ln.
                  break;
                  // ...
           }
       }
       yield return new KeyValuePair<string, string>("FirstName", firstName);
       yield return new KeyValuePair<string, string>("LastName", lastName);
    }
}

In any case, I am sure you get the idea. A bunch of transforms that will turn fields into recognized ones.

Jonathan C Dickinson 2009-06-02 08:09:41

ansaurus

tags:

views:

answers:

Import CSV file into c#

Plan A

Plan B

Plan C

related questions