views:

254

answers:

5

I'm teaching JEE, especially JPA, Spring and Spring MVC. As I have not so much experience in large projects, it is difficult to know what to present to students about optimisation of ORM.

At the present time, I present some classic optimisation tricks:

  • prepared statements (most of ORM implicitely use them by default)
  • first and second-level caches
  • "write first, optimize later"
  • it is possible to switch off ORM and send SQL commands directly to the database for very frequent, specialized and costly requests

Is there any other point the community see about other way to optimize ORM ? I'm especially interested by DAO patterns...

+1  A: 

How about N+1 queries for collections? For example, see here: http://stackoverflow.com/questions/1369136/orm-select-n-1-performance-join-or-no-join

Konrad Garus
+1  A: 

Regarding Spring and Spring MVC, you might find this interesting. It's in C++, not Java, but it shows how to reduce UI source code w.r.t. Spring by an order of magnitude.

Mike Dunlavey
What does this have to do with ORM?
Pascal Thivent
@Pascal-Thivent: It has to do with maintenance of redundacy, which is relevant to the OP's interest in Spring, and possibly to ORM via binding of information to UI.
Mike Dunlavey
+1  A: 

Lazy-loading is via proxys is probably one of the killer features in ORM's.

Additionally, Hibernate is also able to proxy out requests like object.collection.count and optimizes them, so instead of the whole collection being retrieved only a SELECT Count(*) is issues.

Johannes Rudolph
True - many systems have killed their performance with lazy loading and N+1 ;-)
hbunny
And don't forget the classic problem of passing serialized objects across system boundaries and then accessing the lazy collection...
hbunny
yes but it may help you to reduce the amount of data retrieved to what is really needed and you may avoid cartesian products with it.
Johannes Rudolph
I was being a bit tongue-in-cheek ;-) I found ORMs to be one of the leakiest abstractions I've used (with Hibernate) - you really need to be aware of how they work in detail and you cannot use ORMs as a way to avoid learning how databases and SQL work.
hbunny
Anyone who'd serialize his domain entities across system boundaries and does not use a DTO for this purpose is doomed to fail.
Johannes Rudolph
@stevendick: you're right about the leaky abstraction, although I don't think they are the worst. Yup, no replacement for understanding DBs and SQL. That's why profiling your ORMs generated code is so imported.
Johannes Rudolph
+1  A: 

You mentioned the DAO pattern, but many in the JPA camp are saying the pattern is dead (I think the Hibernate guys have blogged on this, but I can't remember the link). Have a look at Spring Roo to see how they add the ORM-related functionality directly to the domain model via static methods.

hbunny
+5  A: 

From the point of developer, there are following optimization cases he must deal with:

  • Reduce chattiness between ORM and DB. Low chatiness is important, since each roundtrip between ORM and database implies network interaction, and thus its length varies between 0.1 and 1ms at least - independently of query complxity (note that may be 90% of queries are normally fairly simple). Particular case is SELECT N+1 problem: if processing of each row of some query result requires an additional query to be executed (so 1 + count(...) queries are executed at total), developer must try to rewrite the code in such a way that nearly constant count of queries is executed. CRUD sequence batching and future queries are other examples of optimization reducing the chattines (described below).
  • Reduce query complexity. Usually ORM is helpless here, so this is solely a developer's headache. But APIs allowing to execute SQL commands directly are frequently intended to be used also in this case.

So I can enlist few more optimizations:

  • Future queries: an API allowing to delay an execution of query until the moment when its result will be necessary. If there are several future queries scheduled at this moment, they're executed alltogether as a single batch. So the main benefit of this is reduction of # of roundtrip to database (= reduction of chattiness between ORM and DB). Many ORMs implement this, e.g. NHibernate.
  • CRUD sequence batching: nearly the same, but when INSERT, UPDATE and DELETE statements are batched together to reduce the chattines. Again, implemented by many ORM tools.
  • Combination of above two cases - so-called "generalized batching". AFAIK, so far this is implemented only by DataObjects.Net (the ORM my team works on).
  • Asynchronous generalized batching: if batch requires no immediate reply, it is executed asynchronously (but certainly, in sync with other batches sent by the same session, i.e. underlying connection is anyway used synchronously). Brings noticeable benefits when there are lots of CRUD statements: the code modifying persistent entities is executed in parallel with DB-side operation. AFAIK, no ORM implements this optimization so far.

All these cases fit under "write first, optimize later" rule (or "express intention first, optimize later").

Another well-known optimization-related API is prefetch API ("Prefetch paths"). The idea behind is to fetch a graph of objects that is expected to be processed further with minimal count of queries (or, better, in minimal time). So this API addresses "SELECT N+1" problem. Again, this part is normally expected to be implemented in any serious ORM product.

All above optimizations are safe from the point of transaction isolation - i.e. they don't break it. Caching-related optimizations normally aren't safe from this point: you must carefully configure caching to ensure you won't get stale objects when getting actual content is important (e.g. on security checks or on some real-time interaction). There are lots of techniques here, starting from usage of built-in caches in finishing with integration with distributed caches (memcached, etc.). Any approach solving the problem is good here; personally I would expect an open API allowing to integrate any with cache I prefer.

P.S. I'm a .NET fanboy, as well as one of DataObjects.Net and ORMeter.NET developers. So I don't know how exactly similar features are implemented in Java, but I'm familiar with the range of available solutions.

Alex Yakunin