views:

139

answers:

5

I have some pretty complex reports to write. Some of them... I'm not sure how I could write an sql query for just one of the values, let alone stuff them in a single query.

Is it common to just pull a crap load of data and figure it all via code instead? Or should I try and find a way to make all the reports rely on sql?

I have a very rich domain model. In fact, parts of code can be expanded on to calculate exactly what they want. The actual logic is not all that difficult to write - and it's nicer to work my domain model than with SQL. With SQL, writing the business logic, refactoring it, testing it and putting it version control is a royal pain because it's separate from your actual code.

For example, one the statistics they want is the % of how much they improved, especially in relation to other people in the same class, the same school, and compared to other schools. This requires some pretty detailed analysis of how they performed in the past to their latest information, as well as doing a calculation for the groups you are comparing against as a whole. I can't even imagine what the sql query would even look like.

The thing is, this % improvement is not a column in the database - it involves a big calculation in of itself by analyzing all the live data in real-time. There is no way to cache this data in a column as doing this calculation for every row it's needed every time the student does something is CRAZY.

I'm a little afraid about pulling out hundreds upon hundreds of records to get these numbers though. I may have to pull out that many just to figure out 1 value for 1 user... and if they want a report for all the users on a single screen, it's going to basically take analyzing the entire database. And that's just 1 column of values of many columns that they want on the report!

Basically, the report they want is a massive performance hog no matter what method I choose to write it.

Anyway, I'd like to ask you what kind of solutions you've used to these kind of a problems.

+1  A: 

The short answer: for analysing large quantities of data, a SQL database is probably the best tool around.

However, that does not mean you should analyse this straight off your production database. I suggest you look into Datawarehousing.

Mark Bannister
Well, they want actual real-time reports of this stuff online. LOL. I'm not sure datawarehousing is the answer though, as all of this info is contained in the same database. It's all there. The thing is, not all of it is denormalized to query it straight up, and I'd have to go into write actual sql scripts/code/views/functions/whatever to implement it all in sql. That's not really something I want to do.
egervari
@egervari: Part of the reason for datawarehousing is to address performance issues without impacting the operational systems. Also, which language do you think will be more suitable than SQL?
Mark Bannister
SQL can do it - especially when you get outside of the basic query and into functions/procedures - but these are not simple calculations. The actual calculation part would be much easier to do in Java where you have a domain model with lots of rich methods to work with. In SQL, you basically gotta write all of this stuff that already exists, plus a bunch of new stuff just to satisfy the report :(
egervari
A: 

For a one-off report, I'll write the code to produce it in whatever I can best reason about it in.

For a report that'll be generated more than once, I'll check on who is going to be producing it the next time. I'll still write the code in whatever I can best reason about it in, but I might add something to make it more attractive to use to that other person.

Jon Hanna
Well, I have a very rich domain model. In fact, parts of code can be expanded on to calculate exactly what they want. The actual logic is not all that difficult to write - it's always nicer in the source language where your code base lies. SQL business logic is frankly messy and I hate it. Not to mention it makes refactoring and version control a pain. My only concern, really, is performance. I gotta pull a ton of data out in order to analyze it, and even the analysis will take awhile.
egervari
Performance can be better either way depending on the data. I can think of real-life examples where I've improved performance by pushing to the SQL or to some OO code, the latter in particular with complex tree or graph structures in the data that are more efficiently manipulated in memory to then do further processing on further SQL results. So even if that was your only concern, the answer would be "it depends". The one thing that would lean me heavily toward SQL is the possibility of it being used directly or indirectly by more SQL or by a DB specialist without the same OO skills.
Jon Hanna
Well, there's no db specialist here - just me. LOL. I am an OO specialist, not a DB specialist. In fact, this object/table graph is very deep and complex. Perhaps I should have mentioned that. That is another reason why I am cringing at the idea of doing this in SQL.
egervari
And indeed, a very pertinent one. I'd pay several magnitude in performance for confidence in correctness, so even if the SQL would be the ideal way to go for the problem at hand, as the human being solving that problem you need to consider your strengths (or in more general terms, you need to consider the strengths of whoever will be doing this, but in this case that's you anyway).
Jon Hanna
+2  A: 

Sounds like a challenging task you have ahead of you. I don't know all the details, but I think I would go at it from several directions:

  1. Prioritize: You should try to negotiate with the "customer" and prioritize functionality. Chances are not everything is equally useful for them.

  2. Manage expectations: If they have unrealistic expectations then tell them so in a nice way.

  3. IMHO SQL is good in many respects, but it's not a brilliant programming language. So I'd rather just do calculations in the application rather than in the database.

  4. I think I'd go for some delay in the system .. perhaps by caching calculated results for some minutes before recalculating. This is with a mind towards performance.

Grubl3r
+6  A: 

Sometimes a report can be generated by a single query. Sometimes some procedural code has to be written. And sometimes, even though a single query CAN be used, it's much better/faster/clearer to write a bit of procedural code.

Case in point - another developer at work wrote a report that used a single query. That query was amazing - turned a table sideways, did some amazing summation stuff - and may well have piped the output through hyperspace - truly a work of art. I couldn't have even conceived of doing something like that and learned a lot just from readying through it. It's only problem was that it took 45 minutes to run and brought the system to its knees in the process. I loved that query...but in the end...I admit it - I killed it. ((sob!)) I dismembered it with a chainsaw while humming "Highway To Hell"! I...I wrote a little procedural code to cover my tracks and...nobody noticed. I'd like to say I was sorry, but...in the end the job ran in 30 seconds. Oh, sure, it's easy enough to say "But performance matters, y'know"...but...I loved that query... ((sniffle...)) Anybody seen my chainsaw..? >;->

The point of the above is "Make Things As Simple As You Can, But No Simpler". If you find yourself with a query that covers three pages (I loved that query, but...) maybe it's trying to tell you something. A much simpler query and some procedural code may take up about the same space, page-wise, but could possibly be much easier to understand and maintain.

Share and enjoy.

Bob Jarvis
+1: wonderful story. Hope you find your chainsaw again.
Don Roby
A: 

People usually use a third party report writing system rather than writing SQL. As an application developer, if you're spending a lot of time writing complex reports, I would severely question your manager's actions in NOT buying an off-the-shelf solution and letting less-skilled people build their own reports using some GUI.

MarkR