views:

110

answers:

1

I am writing a framework for a rewrite of an existing application. We have a data model of around 900 tables with 11000 fields in total and databases approaching 120 GB in the field. The basic elements of my new implementation are WPF, NHibernate 3, C#, .NET 4.0, NHibernate.Validator and Spring. The application itself is very data/transaction intensive and our largest installation has around 300 concurrent users.

A few things I would like feedback about are:

  • Is Spring a good choice? Why should I choose a different one (Castle?). I do have problems with startup time, but I have been able to bring this back to 14 seconds. I didn’t notice much difference between Spring and Castle though. Shorter startup times are of course welcome;

  • I am using Identity fields, but understand this isn't the best option. What viable alternative is there;

  • Data display is done with short sessions, one per query. Data entry on the other hand has one session/transaction for the entire duration of a workflow, which can take up to 10-20 minutes max (2-4 minutes is more usual). Are there alternatives to a session/transaction for this entire duration and how could I set this up?

I am open to all and every input and would like to integrate ideas from people whom have been working longer, and have more experience with NHibernate than I have.

(B.t.w.: I know I’m in way over my head, but that’s the way I prefer it.)

EDIT: I was too harsh concerning HiLo, but after some research Guid's do seem to fit my situation better.

+1  A: 

Hilo is the fastest approach to assigning identifiers. Using identity fields works, and is safer (see below), but since the identifier is generated by the database, every insertion of a row requires a read operation to determine the row's identifier.

If you're going to use hilo, be sure you understand the details of how the algorithm works. (I think it's described elsewhere on this site.) If you make bad choices for the column's or the hilo's data type, or the "lo" value, you can end up with wraparound, which would cause already-used numbers to be generated, which of course is very bad.

The typical way to handle data entry is to close the session, perform the data entry, and then attach the updated objects to a new session. This is covered in the documentation.

The tricky thing with attaching is this: say that object A contains a reference to object B, and object B contains a reference to object C. If you "touched" objects A and B during the initial session, A and B will have been loaded, and B will contain a proxy reference to C. If you attach A to the new session, but forget to attach B, B's proxy reference will still point to the old closed session, which will result in an exception if you try to follow it.

It can be harder than it seems to get this right. During the initial session, if you called a function that did some kind of a search through your object graph, it may be difficult to know later on exactly which objects need to be attached to the new session to make everything work right.

Depending on how reliable your connection to your database it, it may be a better option to keep the session open for the data entry operation and avoid the potential problems involved in attaching the objects to a new session. It depends a lot on how complicated your object model is and what you need to do with the objects.

It would also matter which database you're using. For example, Postgres uses MVCC, so an open session never blocks other users from reading from the database. In a database which uses row locking, the locks are a large part of the problem with long sessions.

Roland Acton
I've read more on HiLo, and Guids seem like a better option. Has the same advantages though. Concerning the sessions: data integrity is very important to us, so I'm mostly worried concerning whether to keep a transaction open for the entire business transaction. It's difficult to determine what the impact of either option has on the long run.
Pieter
The main problem with a long transaction is that you end up with rows locked in the database for a long period of time. This can force other database operations to wait until you're done (depends upon the isolation level), and if the database detects a deadlock, it will arbitrarily choose a transaction to roll back to break the deadlock.
Roland Acton
Another thing worth mentioning is that if an attempt to write objects to the database fails, and you want to try again, you're supposed to start over with fresh objects. (Since the objects may be half-updated due to the failed write attempt.) If the objects have never been in the database before, the easiest way to handle this is to define a copy constructor which copies everything except the identifier, and clone all of the objects. Your copy constructor won't clone the proxy layer of the object, which is the part that is "messed up".
Roland Acton