views:

66

answers:

2

Here's the scenario.

I have an application. Underlying database tables have millions of rows. Say table 'Books' has millions of rows.

In the Application design, I have a custom business object Book and custom-collection BookCollection, to represent collection of books. We have written a tiny-ORM which is responsible for mapping between business objects and datasets. The object itself holds the mapping details by having its properties decorated through custom-attributes.

Now, there's a scenario where BookCollection object needs to hold thousands of records.

What will be an optimal strategy to deal here ? Can I also load Book objects into BookCollection asnychrounsly or in parallel? What is the recommended practice in this scenario ?

A: 

Load them all into a List<Book> or ReadOnlyCollection<Book>, depending on needs. Unless the records are very large (MBs), several thousands shouldn't pose a problem.

I would normally retrieve all the needed records in one query and populate the list that way.

I don't quite understand what you mean by strategy here - and optimal is a loaded term (my optimal and your optimal are probably very different). Optimal in what way?

Oded
+2  A: 

My first question would be; why do you need thousands of books in memory. There are valid scenarios for this, but then you just have to accept the cost. But for most things (searching, filtering, sorting, paging) etc you can just fetch the data page(s) you actively need from the database, which often isn't nearly as many.

Even if you do need all of them, you don't necessarily need them at the same time - for example, you could set up an iterator block (yield return) over something like IDataReader, and only process a row at a time. This isn't actually as much overhead as you might think, and is commonly preferable to buffering large volumes of data. If you need multiple aggregates on the streaming data (reading it only once), PushLinq can help with that for you.

In many other cases, it is possible to do things like aggregates inside the database; this is one of the things LINQ does nicely - letting you express an aggregate at the backend database using the object-model from your domain model.

What is the specific scenario?

Marc Gravell