Best practice for computing operation on consecutive records of a database: SQL or Java ?

views:

131

answers:

Best practice for computing operation on consecutive records of a database: SQL or Java ?

For instance, I would like to compute the time elapsed between 2 events from the following table:

CREATE TABLE Events (_ID INTEGER PRIMARY KEY, type INTEGER, time INTEGER)

My question is should I compute this difference with SQL or with java code ?

Actually, this does not seem to be easy in SQL while traversing the table in java and performing the required computation is quite easy.

More than a coding question, it is a design question since in the first case, the data repository provides computation capability while in the second, the data repository and the computations are decoupled.

I would personally vote for the second one but have some difficulty to explain why.

+1 A:

Most databases are optimized for data insertion, indexing, and retrieval. Calculations are generally secondary to data access, and I would expect Java to be more efficient at it than any database engine.

On the other hand, there is increased overhead involved in copying the data into Java and doing the processing there. If you really want to know what's more efficient, try benchmarking both activities.

If you're not concerned about efficiency at all, I'd suggest that from a pure design standpoint, that it would make more sense to keep the calculations in Java and the data access in SQL.

I think though that you've found a gray area in which there is no wrong answer.

sangretu 2009-06-16 11:49:28

+1 A:

I would say it depends on whether or not you need the timestamps for anything else outside the database. If you only need the time elapsed, then it's easy to calculate the difference and return it in one query. Otherwise I would return the time values and do the calculations in Java.

Bill the Lizard 2009-06-16 11:52:54

Design-wise, this is quite a small computation, but also quite abstract - finding the difference between two time fields. As such you could probably be excused by the gods of code design for not keeping your model decoupled from your logic in this case.

Performance-wise, database engines will be running all their computations in an interpreted environment, so for any big crunch you'll want to keep it in your Java, but again, this is just the difference of two fields...

One final thing, if you are doing this over a large table (and depending on your logic) you might be shipping two sets of data into Java as opposed to one if you pre-calculate, always good to optimize bandwidth usage.

Will 2009-06-16 12:04:05

It is easy enough to do in either language. What version of SQL are you using?

Here's something from an earlier StackOverflow post: http://sqlserver2000.databases.aspfaq.com/how-do-i-time-my-t-sql-code.html

DECLARE @a DATETIME, @b DATETIME SET @a = CURRENT_TIMESTAMP

DECLARE @i INT SET @i = 0 WHILE @i < 10000 BEGIN SET @i = @i + 1 END SET @b = CURRENT_TIMESTAMP SELECT DATEDIFF(MS, @a, @b)

The key thing when timing queries is that there are many things that have to be considered in light of effects such as caching etc. that may throw off your computation.

Are the queries that you are timing eventually going to be called from Java code? Then it would make sense to time in Java, to take that overhead into account and any interactions.

If not, it would make sense to do it in SQL, to avoid any of those interactions.

In this situation, you are trying to measure performance -- and design is not language-specific - you can design this well in SQL or in Java. The key thing is what are you trying to measure? What is the most accurate way to measure it?

You might want to time a large number of queries as the time for any single transaction could be insignificant because of the magnitude of the operation. But if you do this, you have to be careful to avoid caching effects where the results are skewed by the data being in memory instead of being fetched from disk ... unless this is what is likely to happen in a real-world scenario.

Larry Watanabe 2009-06-16 12:12:24

It depends on your work load. If this is a time difference that you "read often" then it is probably best to calculate it once and store it as a denormalized column on your table. If it is always row X compared to row Y (e.g. compare _ID to _ID + 1) then it is a simple self join to calculate the time delta. If the rows to compare are not repeatable then storing the value doesn't make much sense.

2009-06-16 13:01:07

ansaurus

tags:

views:

answers:

Best practice for computing operation on consecutive records of a database: SQL or Java ?

related questions