views:

77

answers:

1

Hello All,

In one sentence, what i ultimately need to know is how to share objects between mid-tier functions w/ out requiring the application tier to to pass the data model objects.

I'm working on building a mid-tier layer in our current environment for the company I am working for. Currently we are using primarily .NET for programming and have built custom data models around all of our various database systems (ranging from Oracle, OpenLDAP, MSSQL, and others).

I'm running into issues trying to pull our model from the application tier and move it into a series of mid-tier libraries. The main issue I'm running into is that the application tier has the ability to hang on to a cached object throughout the duration of a process and make updates based on the cached data, but the Mid-Tier operations do not.

I'm trying to keep the model objects out of the application as much as possible so that when we make a change to the underlying database structure, we can edit and redeploy the mid-tier easily and multiple applications will not need to be rebuilt. I'll give a brief update of what the issue is in pseudo-code, since that is what us developers understand best :)

main
 {
  MidTierServices.UpdateCustomerName("testaccount", "John", "Smith");

  // since the data takes up to 4 seconds to be replicated from
  // write server to read server, the function below is going to
  // grab old data that does not contain the first name and last
  // name update....  John Smith will be overwritten w/ previous
  // data

  MidTierServices.UpdateCustomerPassword("testaccount", "jfjfjkeijfej");
 }

 MidTierServices
 {
  void UpdateCustomerName(string username, string first, string last)
  {
   Customer custObj = DataRepository.GetCustomer(username);

   /*******************
   validation checks and business logic go here...
   *******************/

   custObj.FirstName = first;
   custObj.LastName = last;

   DataRepository.Update(custObj);
  }

  void UpdateCustomerPassword(string username, string password)
  {
   // does not contain first and last updates
   Customer custObj = DataRepository.GetCustomer(username); 

   /*******************
   validation checks and business logic go here...
   *******************/

   custObj.Password = password;

   // overwrites changes made by other functions since data is stale
   DataRepository.Update(custObj); 
  }
 }

On a side note, options I've considered are building a home grown caching layer, which takes a lot of time and is a very difficult concept to sell to management. Use a different modeling layer that has built in caching support such as nHibernate: This would also be hard to sell to management, because this option would also take a very long time tear apart our entire custom model and replace it w/ a third party solution. Additionally, not a lot of vendors support our large array of databases. For example, .NET has LINQ to ActiveDirectory, but not a LINQ to OpenLDAP.

Anyway, sorry for the novel, but it's a more of an enterprise architecture type question, and not a simple code question such as 'How do I get the current date and time in .NET?'

Edit

Sorry, I forgot to add some very important information in my original post. I feel very bad because Cheeso went through a lot of trouble to write a very in depth response which would have fixed my issue were there not more to the problem (which I stupidly did not include).

The main reason I'm facing the current issue is in concern to data replication. The first function makes a write to one server and then the next function makes a read from another server which has not received the replicated data yet. So essentially, my code is faster than the data replication process.

I could resolve this by always reading and writing to the same LDAP server, but my admins would probably murder me for that. The specifically set up a server that is only used for writing and then 4 other servers, behind a load balancer, that are only used for reading. I'm in no way an LDAP administrator, so I'm not aware if that is standard procedure.

+2  A: 

You are describing a very common problem.

The normal approach to address it is through the use of Optimistic Concurrency Control.

If that sounds like gobbledegook, it's not. It's pretty simple idea. The concurrency part of the term refers to the fact that there are updates happening to the data-of-record, and those updates are happening concurrently. Possibly many writers. (your situation is a degenerate case where a single writer is the source of the problem, but it's the same basic idea). The optimistic part I'll get to in a minute.

The Problem

It's possible when there are multiple writers that the read+write portion of two updates become interleaved. Suppose you have A and B, both of whom read and then update the same row in a database. A reads the database, then B reads the database, then B updates it, then A updates it. If you have a naive approach, then the "last write" will win, and B's writes may be destroyed.

Enter optimistic concurrency. The basic idea is to presume that the update will work, but check. Sort of like the trust but verify approach to arms control from a few years back. The way to do this is to include a field in the database table, which must be also included in the domain object, that provides a way to distinguish one "version" of the db row or domain object from another. The simplest is to use a timestamp field, named lastUpdate, which holds the time of last update. There are other more complex ways to do the consistency check, but timestamp field is good for illustration purposes.

Then, when the writer or updater wants to update the DB, it can only update the row for which the key matches (whatever your key is) and also when the lastUpdate matches. This is the verify part.

Since developers understand code, I'll provide some pseudo-SQL. Suppose you have a blog database, with an index, a headline, and some text for each blog entry. You might retrieve the data for a set of rows (or objects) like this:

SELECT ix, Created, LastUpdated, Headline, Dept FROM blogposts 
    WHERE CONVERT(Char(10),Created,102) = @targdate 

This sort of query might retrieve all the blog posts in the database for a given day, or month, or whatever.

With simple optimistic concurrency, you would update a single row using SQL like this:

UPDATE blogposts Set Headline = @NewHeadline, LastUpdated = @NewLastUpdated
    WHERE ix=@ix AND LastUpdated = @PriorLastUpdated

The update can only happen if the index matches (and we presume that's the primary key), and the LastUpdated field is the same as what it was when the data was read. Also note that you must insure to update the LastUpdated field for every update to the row.

A more rigorous update might insist that none of the columns had been updated. In this case there's no timestamp at all. Something like this:

UPDATE Table1 Set Col1 = @NewCol1Value,
              Set Col2 = @NewCol2Value,
              Set Col3 = @NewCol3Value
WHERE Col1 = @OldCol1Value AND
      Col2 = @OldCol2Value AND
      Col3 = @OldCol3Value

Why is it called "optimistic"?

OCC is used as an alternative to holding database locks, which is a heavy-handed approach to keeping data consistent. A DB lock might prevent anyone from reading or updating the db row, while it is held. This obviously has huge performance implications. So OCC relaxes that, and acts "optimistically", by presuming that when it comes time to update, the data in the table will not have been updated in the meantime. But of course it's not blind optimism - you have to check right before update.

Using Optimistic Cancurrency in practice

You said you use .NET. I don't know if you use DataSets for your data access, strongly typed or otherwise. But .NET DataSets, or specifically DataAdapters, include built-in support for OCC. You can specify and hand-code the UpdateCommand for any DataAdapter, and that is where you can insert the consistency checks. This is also possible within the Visual Studio design experience.

alt text

If you get a violation, the update will return a result showing that ZERO rows were updated. You can check this in the DataAdapter.RowUpdated event. (Be aware that in the ADO.NET model, there's a different DataAdapter for each sort of database. The link there is for SqlDataAdapter, which works with SQL Server, but you'll need a different DA for different data sources.)

In the RowUpdated event, you can check for the number of rows that have been affected, and then take some action if the count is zero.


Summary

Verify the contents of the database have not been changed, before writing updates. This is called optimistic concurrency control.


Other links:

Cheeso
Wow!! Thank you very much Cheeso! That is definitely the most thorough response I have ever received. I wish I could give you 500 points for the amount of time you put into this response. Unfortunately OCC will not resolve my issue. The problem is that even at the moment I am writing the data, the updated timestamp still has not been replicated to the read servers. So if I were to pull the data again and verify, the timestamp still has not changed since it was originally read. What I am facing is a time battle against replication. My code is faster than the replication process.
regex
I understand what you're saying, and I think you're incorrect. The updated timestamp is not replicated to the READ servers, I get that part. But when you do an `UPDATE Set Col=@col WHERE LastUpdated=@PriorLastUpdated`, you are doing the read check *directly from the server that is being updated*. I understand the code-versus-replication race; even so, the update with OCC checks will prevent data from being wiped out. It does not solve your problem completely though - you still have to figure out *what to do* in the case where the OCC check fails. That error handling is app-specific.
Cheeso
Alright, now I'm with you. I read through your post several times, but never read it that way. Sometimes II'm a little slow. Only problem is that I really don't know how well that will work w/ LDAP. I don't believe they support where clauses. As far as I know I can only set or get an individual attribute in a directory. However, that is an entirely separate issue. You definitely answered the original question, very well at that. Thanks again!!
regex