tags:

views:

401

answers:

2

We're trying to iterate over a large number of rows from the database and convert those into objects. Behavior will be as follows:

  • Result will be sorted by sequence id, a new object will be created when sequence id changes. The object created will be sent to an external service and will sometimes have to wait before sending another one (which means the next set of data will not be immediately used)
  • We already have invested code in iBatis 3 so an iBatis solution will be the best approach for us (we've tried using RowBounds but haven't seen how it does the iteration under the hood).
  • We'd like to balance minimizing memory usage and reducing number of DB trips.
  • We're also open to pure JDBC approach but we'd like the solution to work on different databases.

UPDATE 1:

  • We need to make as few calls to DB as possible (1 call would be the ideal scenario) while also preventing the application to use too much memory. Are there any other solutions out there for this type of problem may it be pure JDBC or any other technology?

UPDATE 2

  • Query will be backend driven and will only have 1 instance executing at a given time.

Thanks and hope to hear your insights on this.

A: 

It seems you need some sort of pagination. iBatis does that through the standard LIMIT/OFFSET parameters in the query (and RowBounds in iBatis 3 ).

But it seems (if I get it right) that you also are using the GROUP BY feature of iBatis, so that a select returning N records with N1 distint "idx" fields result in the creation of N1 "parent" objects each one having several children objects (with a total of N children objects created). Or something like that.

Unfortunately (and understandably) both things do not mix well.

I dont' see any silver bullet here, one can think of many approaches, each has its shortcomings - hard to evaluate without more information.

If the main objects are "big" (many records) and each one will be processed individually (with a trip to a remote server) you might even want to do an ad-hoc pagination, with a object per page, remembering internally the previosuly read id (something like SELECT ... FROM ... WHERE id = (SELECT MIN(id) FROM .... WHERE id > #lastid# ) )

leonbloy
1. I had a feeling iBatis was using the LIMIT/OFFSET, I guess that means it needs to hit the database for every page right?2. We are doing the grouping on the Java side so no need to use group by keyword in the query, we just need to have the results sorted by id so we can finish creating one object with children of the same id before processing the next objects.3. Your adhoc pagination sounds like a good alternative to the RowBounds approach. This way we're guaranteed that only the next object will be retrieved. I'll give that a try and check performance.Thanks for the help leonbloy!
paul_sns
I added a new requirement which is to make as few DB roundtrips as possible. Hope you can comment also on the updates. Thanks!
paul_sns
I you insist on doing that in one DB query, take a look at RowHandler or ResultHandler - (or, more exotically, return a ResultSet/Cursor - or revert to plain JDBC). But this is only reasonable if the action to be done remotely for each "object" is supposed to give a response (quickly - non dependent on some human interaction), and you can trust that the loop will be completed.
leonbloy
The ResultSet/Cursor approach is I believe what my boss is looking for. I just saw a tutorial for PostgreSQL to setFetchSize, use ResultSet.TYPE_FORWARD_ONLY and setAutocommit(false) to enable this behavior. Unfortunately, I'm trying this now with MySQL and it doesn't look like it's supported. Would you have an idea which DBs have this? Thanks.
paul_sns
Alright, I added the ?useCursorFetch=true to the driver URL for MySQL and memory usage dropped from 364651264 to 4160608. I'll try to check whether setFetchSize is supported in DB2 and Oracle next. Thanks.
paul_sns
PostgreSQL beats Mysql in almost every respect, specially for non trivial db-backed apps. Go for it -if you can. Beware that using cursors in this scenario feel rather dirty to me (I have very little experience with them - do your boss really know what he is doing?) and they might introduce leaks. Your application must have full control about closing them properly, always.
leonbloy
A: 

We need to make as few calls to DB as possible (1 call would be the ideal scenario) while also preventing the application to use too much memory. Are there any other solutions out there for this type of problem may it be pure JDBC or any other technology?

You should really not worry about the amount of DB calls. Just query exactly the data the enduser needs to see at once. It can't be done more efficiently. Google also doesn't query the entire database to show only the first 10. No, it queries exactly those 10 for display, nothing less or more. This is much, much faster and more efficient than hauling/duplicating the entire database into application's memory and working on that. Take benefit of the powers of the RDBMS. That's what it was invented/intended for.

BalusC
Thanks BalusC. Sorry, but I forgot to mention that this will not be a user driven query but one which is triggered by the backend. Not sure why the boss wants a single DB call, I mentioned that it might be hard to balance memory usage with that approach but he suggested to keep trying which is why I'm still looking for other alternatives.
paul_sns
+1 BalusC is right, generally speaking (and I suspect particualarly speaking also)
leonbloy
@paul: Well, if the boss says it... :/ Talk with him. Convince him more.
BalusC
@Downvoter: downvoting valid answers out of pure frustation makes no sense. Abreact somewhere else. Fire your boss and look for a different job.
BalusC