I'm creating a custom DataSet and I'm under some constrains:
- I want the user to specify the type of the data which they want to store.
- I want to reduce type-casting because I think it will be VERY expensive.
- I will use the data VERY frequently in my application.
I don't know what type of data will be stored in the DataSet, so my initial idea was to make it a List
of object
s, but I suspect that the frequent use of the data and the need to type-cast will be very expensive.
The basic idea is this:
class DataSet : IDataSet
{
private Dictionary<string, List<Object>> _data;
/// <summary>
/// Constructs the data set given the user-specified labels.
/// </summary>
/// <param name="labels">
/// The labels of each column in the data set.
/// </param>
public DataSet(List<string> labels)
{
_data = new Dictionary<string, List<object>>();
foreach (string label in labels)
{
_data.Add(label, new List<object>());
}
}
#region IDataSet Members
public List<string> DataLabels
{
get { return _data.Keys.ToList(); }
}
public int Count
{
get { _data[_data.Keys[0]].Count; }
}
public List<object> GetValues(string label)
{
return _data[label];
}
public object GetValue(string label, int index)
{
return _data[label][index];
}
public void InsertValue(string label, object value)
{
_data[label].Insert(0, value);
}
public void AddValue(string label, object value)
{
_data[label].Add(value);
}
#endregion
}
A concrete example where the DataSet
will be used is to store data obtained from a CSV
file where the first column contains the labels. When the data is being loaded from the CSV
file I'd like to specify the type rather than casting to object
. The data could contain columns such as dates, numbers, strings, etc. Here is what it could look like:
"Date","Song","Rating","AvgRating","User"
"02/03/2010","Code Monkey",4.6,4.1,"joe"
"05/27/2009","Code Monkey",1.2,4.5,"jill"
The data will be used in a Machine Learning/Artificial Intelligence algorithm, so it is essential that I make the reading of data very fast. I want to eliminate type-casting as much as possible, since I can't afford to cast from 'object' to whatever data type is needed on every read.
I've seen applications that allow the user to pick the specific data type for each item in the csv file, so I'm trying to make a similar solution where a different type can be specified for each column. I want to create a generic solution so I don't have to return a List<object>
but a List<DateTime>
(if it's a DateTime column) or List<double>
(if it's a column of doubles).
Is there any way that this can be achieved? Perhaps my approach is wrong, is there a better approach to this problem?