views:

266

answers:

8

Here is my situation: I am trying to follow as hard as I can the 3-tier pattern (i.e. Presentation, Business and Data layer). When I need data from the DB, the Business layer calls the Data layer which returns the information. The Data layer never return a SqlDataReader or DataTable object, but often an enumeration of custom object known by the Data Access Layer. It works pretty well when the Data layer has to return a list with few objects.

I am now facing this problem, my application (the business layer) must process 500000 records. I could simply add another method to my Data layer and return an IEnumerable but this sound very bad to me. I don't want to load half-million records in memory.

My question is, considering the 3-tier model, how should I handle this case? If I had no 3-tiers pattern, I would simply use SqlDataReader in my business classes. Any suggestions?

UPDATE: The data will not be displayed, so this is not a paging issue (the presentation layer is not involved at all here). I simply have to analyze each record and then keep some of them.

Thanks

+2  A: 

I assume you're not displaying 500,000 records to the front end at once? You're probably doing some pagination, right? So, only return one page worth of data from the database at one time.

Matt Grande
+1  A: 

Yes, your instinct is correct.

I'm betting that your UI client does not want to look at half a million records at once. Google doesn't return every hit in a single page; you won't, either.

You have a choice as to where and when your application processes those half a million records. You can chunk them into smaller units of work; you can process them asynchronously; you can write a stored procedure and process them in the database without bringing them all over to the middle tier.

The MVC pattern is wonderful, but it's not holy writ. Make the choices that work for your app.

duffymo
A: 

This is not an uncommon problem and occurs frequently in situations where you need to consolidate large amounts of data and present summaries to the user (reports are a typical example). Your solution should be designed with these considerations in mind. It does not make sense to ignore the efficiencies offered by sql readers (or similar tools) when strict coherence to some particular architectural model makes your application inefficient. It is often possible to overcome some of these problems by adapting an architectural model to your needs. Generic architectural models are rarely applicable out of the box. They are guidelines that should be applied to your particular needs.

Richard Dorman
A: 

A piece of paper can never trump reality. If your specific problem asks to break the 3-tier paradigm, do it.

Aaron Digulla
A: 

There is no shame in doing whatever analysis you need to at the database level. If you can slice and dice what you need with stored procedure or make the necessary correlations with stored procedures and use an application for more complex operations you should be fine.

The question is, does the user expect to push a button and process all 500K records and see a result? If so, are they willing to sit and watch a spinning gif or will it be satisfactory to receive some type of notification when the process is complete? If processing the 500K is of the utmost importance, does your data model need alteration to support this process? There are processing methods such as Hadoop and message queues that are geared for this high volume, but do you need to go to this extent? You might able to set the expectations of your users before pulling you hair out over performance.

David Robbins
+1  A: 

In some cases, you have to break the 3-tier boundaries. But before you do, you could ask yourself:

  1. When you "analyze each record and then keep some of them," is that really part of the business logic? Or is it a data-access function? It might be the case that this belongs in the data access layer.

  2. If it is part of the business logic, do you need all 500000 records in order to make a decision about whether to "keep" any individual record? It might be that the business layer should be processing one record at a time. Making 500000 consecutive database calls is not pretty, but if that is what the app should be doing from a conceptual standpoint, there are ways to mitigate that.

I don't recommend doing anything dumb just to keep the 3 tiers separate. But sometimes, when you think you have to cross the line, it's because there is something in the design that needs a second look.

--
bmb

bmb
+1  A: 

You can build a abstraction on top of the SqlReader class. That way you don't have to pass the SqlReader directly, but you can still process the objects one at a time.

Think Iterators.

Peter Stuifzand
A: 

If I am understanding this correctly you want to "Analyze" the records and then keep some of them and get rid of rest of them. Well in that case I think it will be best to handle this within the database itself (PL/SQL or T/SQL). Requirements like these should be a top priority and not the architecture. Since you are not displaying just analyzing, it's best to do in the procedure itself.

cbrcoder