views:

243

answers:

1

We are developing an application with a persistence layer using OpenJPA1.1 and an Oracle DB as backend storage. I will use queries with subselects (see my question at Solving JPA query finding the last entry in connected list).

Now my colleagues at work remark, that such queries can lead to performance problems as the database is filled with thousands of customer date used by a few thousand concurrent user (which will be reality in production).

So, my question is: is there a "best practice" using subselects in queries under this circumstances? And what must be considered by doing this?

+1  A: 

I would first prove that it's a problem. You'll want to load the database up with dummy data and see how your queries perform as the database grows larger. Otherwise you are spending time optimizing something that may not be a problem.

One thing to think about. In all the places I've worked, where things fall apart is not thousands of records, but millions. You've got this system, and it works fine for a while and then just starts slowing down even as you throw more hardware at it. The place I'm working now has about 70 million records in his history table dating back to 1998. Performance on some queries is horrible as a result and they're spending a lot of time working around these issues.

But at some point you really do have to ask. Do we need to keep data more than 4 years old in our transactional system? Or even 4 months old? This time limit depends on your business need, but if you keep your transactional system with only the data that is needed to process ongoing work... and archive into a data warehouse your historical records. You'll improve your overall performance, because chances are it's only occasionally you need to query that old data, so why keep it with your recent data?

If you think about this up front, you'll save a lot of headaches long term.

Steve Sheldon