I have been given the task of developing a small library (using C# 3.0 and .NET 3.5) to provide data import functionality for an application.
The spec is:
- Data can be imported from CSV file (potentially other file formats in the future)
- The CSV files can contain any schema and number of rows, with a maximum file size of 10MB.
- It must be possible to change the datatype and column name of each column in the CSV file.
- It must be possible to exclude columns in the CSV file from the import.
- Importing the data will result in a table matching the schema being created in a SQL Server database, and then being populated using rows in the CSV.
I've been playing around with ideas for a while now my current code feels like it has been hacked together a bit.
My current implementation approach is:
- Open CSV and estimate the schema, store in an ImportSchema class
- Allow the schema to be modified.
- Use SMO to create the table in SQL according to the schema.
- Create a System.Data.DataTable instance using the schema for datatypes.
- Use CsvReader to read the CSV data into the DataTable.
- Apply column name changes and remove unwanted columns from DataTable.
- Use System.Data.SqlClient.SqlBulkCopy() to add the rows from the DataTable into the created database table.
This sounds overly complex to me and I am facing a mental block trying to wrap it up neatly in a handful of testable/extensible objects.
Any suggestions/thoughts on ways to approach this problem, both from an implementation and a design perspective?
Many thanks for any suggestions.