tags:

views:

310

answers:

4
+1  Q: 

A Better DataTable

Hi All --

I have an application that uses DataTables to perform grouping, filtering and aggregation of data. I want to replace datatables with my own data structures so we don't have any unnecessary overhead that we get from using datatables. So my question is if Linq can be used to perform the grouping, filtering and aggregation of my data and if it can is the performance comparable to datatables or should I just hunker down and write my own algorithms to do it?

Thanks

Dan R.

A: 

Why not use a local database like Sqlserver CE or firebird embedded? (or even ms access! :)). Store the data in the local database, do the processing using simple sql queries and pull the data back. Much simpler and likely less overhead, plus you don't have to write all the logic for grouping/aggregates etc. as the database systems already have that logic built in, debugged and working.

Frans Bouma
Well, not access... nobody deserves that ;-p
Marc Gravell
@Marc: WHy not? :) MS Access is a great file-based database, with good performance for single user apps and has more features than sqlserver CE. For a database, I don't really see why people should pick CE over access. :)
Frans Bouma
+1  A: 

Unless you go for simple classes (POCO etc), your own implementation is likely to have nearly as much overhead as DataTable. Personally, I'd look more at using tools like LINQ-to-SQL, Entity Framework, etc. Then you can use either LINQ-to-Objects against local data, or the provider-specific implementation for complex database queries without pulling all the data to the client.

LINQ-to-Objects can do all the things you mention, but it involves having all the data in memory. If you have non-trivial data, a database is recommended. SQL Server Express Edition would be a good starting point if you look at LINQ-to-SQL or Entity Framework.


Edited re comment:

Regular TSQL commands are fine and dandy, but you ask about the difference... the biggest being that LINQ-to-SQL will provide the entire DAL for you, which is a huge time saver, as well as making it possible to get a lot more compile-time safety. But is also allows you to use the same approach to look at your local objects and your database - for example, the following is valid C# 3.0 (except for [someDataSource], see below):

var qry = from row in [someDataSource]
          group row by row.Category into grp
          select new {Category = grp.Key, Count = grp.Count(),
              TotalValue = grp.Sum(x=>x.Value) };

foreach(var x in qry) {
    Console.WriteLine("{0}, {1}, {2}", x.Category, x.Count, x.TotalValue);
}

If [someDataSource] is local data, such as a List<T>, this will execute locally; but if this is from your LINQ-to-SQL data-context, it can build the appropriate TSQL at the database server. This makes it possible to use a single query mechanism in your code (within the bounds of LOLA, of course).

Marc Gravell
what is the difference between using Linq-to-sql and regular sql?
Dan R.
A: 

You'd be better off letting your database handle grouping, filtering and aggregation. DataTables are actually relatively good at this sort of thing (their bad reputation seems to come primarily from inappropriate usage), but not as good as an actual database. Moreover, without a lot of work on your part, I would put my money on the DataTable's having better performance than your homegrown data structure.

MusiGenesis
A: 

Yes, you can use LINQ to do all those things using your custom objects.

And I've noticed a lot of people suggest that you do this type of stuff in the database... but you never indicated where the database was coming from.

If the data is coming from the database then at the very least the filtering should probably happen there, unless you are doing something specialized (e.g. working from a cached set of data). And even then, if you are working with a significant amount of cached data, you might do well to put that data into an embedded database like SQLite, as someone else has already mentioned.

Giovanni Galbo
the data is ad-hoc so it can be large (500K+) or small. I am currently caching back to db in ado com persisted recoredset for backwards compatibility. But I am thinking about caching back to db in dynamically created tables instead. Would regular sql and ado.net or link--to-sql be faster ?
Dan R.