views:

158

answers:

10

I have a question, just looking for suggestions here.

So, my application is 'modernizing' a desktop application by converting it to the web, with an ICEFaces UI and server side written in Java. However, they are keeping around the same Oracle database, which at current count has about 700-900 tables and probably a billion total records in the tables. Some individual tables have 250 million rows, many have over 25 million.

Needless to say, the database is not scaling well. As a result, the performance of the application is looking to be abysmal. The architects / decision makers-that-be have all either refused or are unwilling to restructure the persistence. So, basically we are putting a fresh coat of paint on a functional desktop application that currently serves most user needs and does so with relative ease. The actual database performance is pretty slow in the desktop app now. The quick performance I referred to earlier was non-database related stuff (sorry I misspoke there). I am having trouble sleeping at night thinking of how poorly this application is going to perform and how difficult it is going to be for everyday users to do their job.

So, my question is, what options do I have to mitigate this impending disaster? Is there some type of intermediate layer I can put in between the database and the Java code to speed up performance while at the same time keeping the database structure intact? Caching is obviously an option, but I don't see that as being a cure-all. Is it possible to layer a NoSQL DB in between or something?

+1  A: 

If you have a lot of lookups that are for items not in the database you can reduce the number by using a bloom filter. Add everything in the database to the bloom filter then before you do a lookup check the bloom first. Only if the bloom reports it present do you need to bother the database. The bloom will result in false positives but you can design it to the 'size vs false positive' trade off that best suits you.

The strategy is used by Google in their big-table database and they have reported that it significantly improves performance.

http://en.wikipedia.org/wiki/Bloom_filter

Good luck, working on tasks you don't believe in is tough.

Daniel
I will take a look at this approach. Thank you.And yeah, you have no idea. All this project is is working on tasks I don't believe in.
A: 

Don't be put down by this sort of thing. See it as a challenge, rather than something to be losing sleep over! I know it's tempting as a programmer to want to rip everything out and start over again, but from a business perspective, it's just not always viable. For example, by using the same database, the business can continue to use the old application while the new one is being developed and switch over customers in groups, rather than having to switch everyone over at the same time.

As for what you can do about performance, it depends a lot on the usage pattern. Caching can help greatly with mostly read-only databases. Even with read/write database, it can still be a boon if correctly designed. A NoSQL database might help with write-heavy stuff, but it might also be more trouble than it's worth if the data has to end up in a regular database anyway.

In the end, it all depends greatly on your application's architecture and usage patterns.

Good luck!

Dean Harding
A: 

Well without knowing too much about what kinds of queries that are mostly done (I would expact lookups to be more common) perhaps you should try caching first. And cache at different layers, at the layer before the app server if possible and of course what you suggested caching at the layer between the app server and the database.

Caching works well for read data and it might not be as bad as you think.

Have you looked at Terracotta ? They do have some caching and scaling stuff that might be relavant to you.

Take it as a challenge!

Stephen Lee
I have not yet. I thought caching might be an option, but I didn't know A) if it could fully solve this problem and B) what the best caching strategy was
It should solve some of the problems, and its probably the easiest way to go IMHO, if you are using HIbernate as the persistance layer, u probably can plug in EhCache quite transparently. Distributed caches is also a possible solution but that will mean more complexity.Without trying and profiling the results it will be hard to tell at a glance what the best solution is. In ur edit you say the desktop app performance is poor, you might want to look at the queries themselves. Are they pulling too much data? The problem might be deeper then what can be solve by adding caches.
Stephen Lee
+4  A: 

I don't understand how to reconcile two things you said.

Needless to say, the database is not scaling well

and

currently serves most user needs and does so with relative ease and quick performance.

You don't say you are adding new users or new function, just making the same function accessible via a web interface.

So why is there a problem. Your Web App will be doing more or less the same database work as before.

In fact introducing a web tier could well give new caching opportunities so reducing the work the DB is doing.

If your early pieces of web app development are showing poor performance then I would start by trying to understand how the queries you are doing in the web app differ from those done by the existing app. Is it possible that you are using some tooling which is taking a somewhat naive approach to generating queries?

djna
My mistake. I have edited my above question. The database is not performing well in the current desktop app either. It just seems to be even slower in the web app.
OK, well in which case the first question: Why the difference between old and new? It's not obvious to me that the new app should be worse. And as is suggested in other answers, good caching really should help. However, in the end some database tuning work sounds to be needed.
djna
A: 

The way to 'mitigate this impending disaster' is to do what you should be doing anyway. If you follow best practices the pain of switching out your persistence layer at a later stage will be minimal.

Up until the time that you have valid performance benchmarks and identified bottlenecks in the system talk of performance is premature. In any case I would be surprised if many of the 'intermediate layer' strategies aren't already implemented at the database level.

CurtainDog
A: 

If the database is legacy and enormous, then

1) it cannot be changed in a way that will change the interface, as this will break too many existing applications. Or, if you change the interface, this has to be coordinated with modifying multiple applications with associated testing.

2) If the issue is performance, then there are probably many changes that can be made to optimize the database without changing the interface.

3) Views can be used to maintain the existing interfaces while restructuring tables for more efficiency, or possibly to allow more efficient access in the future.

4) Standard database optimizations, such as performance analysis, indexing, caching can probably greatly increase efficiency and performance without changing the interface.

There's a lot more that can be done, but you get the idea. It can't really be updated in one single big change. Changes have to be incremental, or transparent to the applications that use it.

Larry Watanabe
I understand. The problem is we rely on a separate DBA department to make those optimizations and waiting on them is like waiting on the Ents in Lord of the Rings. Takes forever to make a decision. I'd rather implement a solution that minimizes my dependencies on any other dept.
You can't do this. DBA departments are masters of the political game, and doing a quasi technical solution like this that is really a political game is asking you to get your throat cut :)You need to take the DBA's out for beers, and make some good friends in that department who will make sure that your stuff gets done.
Larry Watanabe
Ignoring the dbas is a recipe for disaster. There is a reason why they take time to do these things and it has to do with keeping production running which is critical. Learn to work with people not just diss them. Learn to see the perspective of others and the needs they have and you will find it easier to get things done at work.
HLGEM
+2  A: 

If the current app performs well and your new java app doesn't, the problem is not in the database layer, but in your application layer. If performance is as bad as you say, they should notice fairly early and have the option of going back to the Desktop application.

The DBA should be able to readily identify the additional workload on the database from your application. Assuming the logic hasn't changed it is unlikely to be doing more writes. It could be reads or it could be 'chattier' (moving the same amount of information but in smaller parcels). Chatty applications can use a lot of CPU. A lot of architects try to move processing from the database layer into the application layer because "work on the database is expensive" but actually make things worse due to the overhead of the "to-and-fro".

PS.

There's nothing 'bad' about having 250 million rows in a table. Generally you access a table through an index. There are typically 2 or 3 hops from the top of an index to the bottom (and then one more to the table). I've got a 20 million row table with a BLEVEL of 2 and a 120+ million row table with a BLEVEL of 3.

Indexing means that you rarely hit more than a small proportion of your data blocks. The frequently used index blocks (and data blocks) get cached in the database server's memory. The DBA would be able to see if this memory area is too small for the workload (ie a lot of physical disk IO).

If your app is getting a lot of information that it doesn't really need, this can put pressure on the memory space. Don't be greedy. if you only need three columns from a row, don't grab the whole row.

Gary
+1 Especially for that brilliant first sentence. Hope it rings a bell.
Rob van Wijk
See my above edit and comment. The current app is not performing well. I misspoke there. It is taking 2-3 minutes to run queries. It just seems the web app is even slower than that.Btw, in regards to your last sentence, I am all for that, however, that seems to imply either a Transfer Object pattern OR passing model objects not fully hydrated to the front-end. Granted, I am fine with either, but I have told they are both bad practice.
+1  A: 

So you put a fresh coat of paint on a functional and quick desktop application and then the system becomes slow?

And then you say that "it is needless to say that the database isn't scaling well"?

I don't get it. I think that there is something wrong with your fresh coat of paint, not with the database.

TTT
+1  A: 

What you describe is something that Oracle should be capable of handling very easily if you have the right equipment and database design. It should scale well if you get someone on your team who is a specialist in performance tuning large applications.

Redoing the database from scratch would cost a fortune and would introduce new bugs and the potential for loss of critical information is huge. It almost never is a better idea to rewrite the database at this point. Usually those kinds of projects fail miserably after costing the company thousands or even millions of dollars. Your architects made the right choice. Learn to accept that what you want isn't always the best way. The data is far more important to the company than the app. There are many reasons why people have learned not to try to redesign the database from scratch.

Now there are ways to improve database performance. First thing I would consider with a database this size is partioning the data. I would also consider archiving old data to a data warehouse and doing most reporting from that. Other things to consider would be improving your servers to higher performing models, profiling to find slowest running queries and individually fixing them, looking at indexing, updating statistics and indexes (not sure if this is what you do on Oracle, I'm a SLQ Server gal but your dbas would know). There are some good books on refactoring old legacy databases. The one below is not datbase specific. http://www.amazon.com/Refactoring-Databases-Evolutionary-Database-Design/dp/0321293533/ref=sr_1_1?ie=UTF8&s=books&qid=1275577997&sr=8-1 There are also some good books on performance tuning (look for ones specific to Oracle, what works for SQL Server or mySQL is not what is best for Oracle) Personally I would get those and read them from cover to cover before designing a plan for how you are going to fix the poor performance. I would also include the DBAs in all your planning, they know things that you do not about the database and why some things are designed the way they are.

HLGEM
A: 

The database is PART of the application. Don't consider them to be separate, it isn't.

As developer, you need to be free to make schema changes as necessary, and suggest data changes to improve performance / functionality in production (for example archiving old data).

Your development system presumably does not have that much data, but has the exact same schema.

In order to do performance testing, you will need a system with the same hardware and same size data (same data if possible) as production. You should explain to management that performance testing is absolutely necessary as you feel the app isn't going to perform.

Of course making schema changes (adding / removing indexes, splitting tables out etc) may affect other parts of the system - which you should consider as parts of a SYSTEM - and hence do the necessary regression testing and fixing.

If you need to modify the database schema, and make changes to the desktop client accordingly, to make the web app perform, that is what you have to do - justify your design decision to the management.

MarkR