tags:

views:

223

answers:

1

hi all,

I am trying to figure the best way to map input files, such as XLS or CSV files to objects in the system. Let me expand on that a little. I have the following objects:

Company Contact

Each have variables variables such as:

Addresses Phone numbers Emails etc

The input file I can get varies, sometimes there will be column headings, sometimes not, the column arrangement and numbers will change at times. It could look as follows:

COMPANY         - CONTACT       - ADDRESS       - PHONE
----------------------------------------------------------------------
company1          contact1        address1        phone1
company1          contact2        address2
company2          contact3                        phone2
                  contact4        address3

The above shows that "company" will have the "addres1" and "phone1" attached to the company, the "contact1" is its own object but has "company1" as its parent object.

The same is true for "contact2", (except for the phone).

"contact4" has no parent object so "address3" belongs to the contact, rather than a Company.

My thinking so far would be to have the following objects:

Mappings - (this is where I am not sure how to implement it. It should say how columns should map to variables, the ownership / hierarchy - e.g. Company has Address)

IMappingLoader - (loads the mapping objects)

-- XmlMappingLoader

-- DbMappingLoader

IDataLoader - (loads the data into a dataset)

-- XLSLoader

-- CSVLoader

So the mappings would be loaded, the data loaded into a dataset and the correct objects returned.

The part I am not really sure the best way to approach is how to do the mappings part. How to be able to say which column should belong to which object.

Thanks for any and all advice.

Jon

+1  A: 

Your parsers are going to have to know about your columns....otherwise it is unable to map the data to the specific object properties. Unless of course you introduce an indexed-properties class which you could store the information based on the order it is read.

You should create a parser factory and based on the extension of the file you would return the correct parser for the job e.g.

public class Record 
{ 
    private Dictionary<int, string> items = new Dictionary<int, string>(); 
    private int propCount; 

    public Record(int size) 
    { 
        // populate array with empty strings 
        for(int i = 0; i <= size -1; i++) 
            items.Add(i, String.Empty); 
        propCount = size; 
    } 

    public string this[int index] 
    { 
        get { return items[index]; } 
        set { items[index] = value; } 
    } 

    public int PropertyCount { get { return propCount; } } 
}

public interface IRecordParser 
{ 
    string FileName { get; set; } 
    string[] GetHeadings(); 
    bool HasHeaders { get; set; } 
    void GoToStart(); 
    Record ParseNextRecord(); 
} 

public abstract class RecordParser 
{ 
    public string FileName { get; set; } 
    public bool HasHeaders { get; set; } 
    public abstract string[] GetHeadings(); 
    public abstract void GoToStart(); 
    public abstract Record ParseNextRecord();     
} 

public class ExcelRecordParser : RecordParser, IRecordParser 
{ 
    public ExcelRecordParser() 
    { 
    } 

    public override string[] GetHeadings() 
    { 
        if (HasHeaders)  
            // return column headings
        else 
            // return default headings from settings file
    } 

    public override void GoToStart() 
    { 
        // navigate to first row (or +1 if HasHeaders is true) 
    } 

    public override Record ParseNextRecord() 
    { 
        var headers = GetHeadings(); 
        var r = new Record(headers.Length);

        // enumerate rows, then for each row do...
        for(int i = 0; i <= headers.Length - 1; i++) 
            r[i] = row[i];

        return r;
    } 
}

public class CsvRecordParser : RecordParser, IRecordParser 
{ 
    public CsvRecordParser() 
    { 
    } 

    public override string[] GetHeadings() 
    { 
        if (HasHeaders) 
            // return first row split as headings
        else 
            // return default headers from settings file
    } 

    public override void GoToStart() 
    { 
        // navigate to start of file (or +1 if HasHeaders is true) 
    } 

    public override Record ParseNextRecord() 
    { 
        var headers = GetHeadings(); 
        var r = new Record(headers.Length);

        // enumerate lines, then for each line do...
        for(int i = 0; i <= headers.Length - 1; i++) 
            r[i] = line[i];

        return r;
    } 
} 

public static class RecordParserFactory 
{ 
    public static IRecordParser Create(string ext) 
    { 
        switch (ext) 
        { 
            case ".xls": 
                return new ExcelRecordParser() as IRecordParser; 
            case ".csv": 
                return new CsvRecordParser() as IRecordParser;
            default:
                return null;
        } 
    } 
}

Usage

// would return an instance of CSV Parser
string file = @"C:\Data\MyRecords.csv";
IRecordParser parser = RecordParserFactory.Create(System.IO.Path.GetExtension(file));

// would return an instance of Excel Parser
string file = @"C:\Data\MyRecords.xls";
IRecordParser parser = RecordParserFactory.Create(System.IO.Path.GetExtension(file));

This would allow you to add other parsers if your file format changes in the future e.g. XML, Binary etc

James
Hi James, yep this is the part I am happy with and have implemented something very similar. I think I have been trying to over complicate it, indirectly. I want to allow the business to ensure they can dynamically load new data by setting the columns and mappings them selves. But the business logic can / should not change, so I think you have put me down the right path to a good solution. Thanks :)
Jon
@Jon, happy to help. It is always hard trying to parse dynamic information however, I would say your system does need to know at least *something* about the data it is parsing otherwise it would be impossible for it to map it. Unless of course it was purely down to the order in which the data is stored, in that case you could probably use reflection or even update your class to support indexed properties. I will update my answer to show you what I mean.
James
@Jon, see my updated answer it now includes an indexed-properties based class. Basically that allows you to have as many properties as you want and it will always populate based on the order of the information. You could easily update the Record class to take in a list of headings instead of a size and then get the information based on the headings. However, to be as generic as possible, stick with indexes.
James
thanks James. Very helpful indeed. Most grateful. :) That really clarifies / solidifies my thinking and nailed the approach.
Jon