views:

137

answers:

3

I'm creating a custom DataSet and I'm under some constrains:

  • I want the user to specify the type of the data which they want to store.
  • I want to reduce type-casting because I think it will be VERY expensive.
  • I will use the data VERY frequently in my application.

I don't know what type of data will be stored in the DataSet, so my initial idea was to make it a List of objects, but I suspect that the frequent use of the data and the need to type-cast will be very expensive.

The basic idea is this:

class DataSet : IDataSet
{
    private Dictionary<string, List<Object>> _data;

    /// <summary>
    /// Constructs the data set given the user-specified labels.
    /// </summary>
    /// <param name="labels">
    /// The labels of each column in the data set.
    /// </param>
    public DataSet(List<string> labels)
    {
        _data = new Dictionary<string, List<object>>();
        foreach (string label in labels)
        {
            _data.Add(label, new List<object>());
        }
    }

    #region IDataSet Members

    public List<string> DataLabels
    {
        get { return _data.Keys.ToList(); }
    }

    public int Count
    {
        get { _data[_data.Keys[0]].Count; }
    }

    public List<object> GetValues(string label)
    {
        return _data[label];
    }

    public object GetValue(string label, int index)
    {
        return _data[label][index];
    }

    public void InsertValue(string label, object value)
    {
        _data[label].Insert(0, value);
    }

    public void AddValue(string label, object value)
    {
        _data[label].Add(value);
    }

    #endregion
}

A concrete example where the DataSet will be used is to store data obtained from a CSV file where the first column contains the labels. When the data is being loaded from the CSV file I'd like to specify the type rather than casting to object. The data could contain columns such as dates, numbers, strings, etc. Here is what it could look like:

"Date","Song","Rating","AvgRating","User"
"02/03/2010","Code Monkey",4.6,4.1,"joe"
"05/27/2009","Code Monkey",1.2,4.5,"jill"

The data will be used in a Machine Learning/Artificial Intelligence algorithm, so it is essential that I make the reading of data very fast. I want to eliminate type-casting as much as possible, since I can't afford to cast from 'object' to whatever data type is needed on every read.

I've seen applications that allow the user to pick the specific data type for each item in the csv file, so I'm trying to make a similar solution where a different type can be specified for each column. I want to create a generic solution so I don't have to return a List<object> but a List<DateTime> (if it's a DateTime column) or List<double> (if it's a column of doubles).

Is there any way that this can be achieved? Perhaps my approach is wrong, is there a better approach to this problem?

+2  A: 

I would suggest trying what you have now. Maybe the performance will be good enough. If not, and only then, you could think about optimizing further.

You could also store each field as a variant object like this:

struct Variant
{
   string StringValue;
   DateTime DateTimeValue;
   bool BoolValue;
   // ... etc. ...
}

Then you would just need to access the appropriate member from the struct, but this may add just as much overhead with the memory usage and if statements...

John JJ Curtis
+2  A: 

Bear in mind that DataSets also store rows, columns etc. as objects. Getting them type-safe usually means that in your typed dataset the cast is done.

I think it really depends what has to happen with the data read from the csv, but to eliminate casting without knowing in advance which types you will require, I can only think of creating the type holding the data dynamically through Reflection.Emit.

As Jeff says, though, the casting may not kill your app.

flq
A: 

I'am not sure what you mean about "specify the data type" but I think you can use generic methods (to get the typed list of values) and reflection (to invoke your generic methods with your specified types).

Generic method signature :

List<T> GetValues<T>(string label) { ... }

Generic invocation :

Type dataSetType = typeof(DataSet);
MethodInfo methodInfo = dataSetType .GetMethod("GetValues");
MethodInfo genericMethodInfo = methodInfo.MakeGenericMethod(typeof(yourtype));
// Invoke

Note that you'll only have to create one generic MethodInfo by Type at runtime and once : you can persist all the generics MethodInfo in a dictonary (where the Type can be the key).

If you'r experiencing performance problem you can also use Lambda expressions.

JoeBilly