views:

104

answers:

1

Do you see database triggers / referential integrity rules being used in a way that changes actual data in the database (changing row w in table x causes a change in row y in table z)?

If yes, How does this tie-in with the increasing popularity of in-memory caching (memcache and friends)? After all, these actions occur inside the database but the caching system must be aware of them in order to reflect to correct state (or at least invalidate the possibly changed state). I find it hard to believe that callbacks are implemented for such cases.

Does anyone have real-world experience with such a setup / real-world experience with considering such a setup and abandoning it (which way did you go? if caching, how do you enforce integrity?)

+4  A: 

Simple Answer:

  • Referential integrity is a must have
  • Caching is a qualified must have
  • Triggers are a nice to have

Longer Answer

I've been developing applications on relational databases since 1993 (Dec RDB since you ask, and on flat file systems before then) and triggers have never been popular with many developers because they can 'delete stuff that you don't want deleting'. Referential integrity is also often frowned on by developers because a database in third normal form with proper referential integrity is difficult to bodge together in a few minutes.

Caching is also often considered quite 'hard' to do right, although I'm not sure why.

Whilst many systems can live without triggers, I'd say that no application database can comfortably survive without referential integrity. Look at the tags on this question, the database behind this site will have a table for the tags (probably called 'Tag') and questions (probably called 'Question'). 'Question' will have a foreign key to the primary key on the Tag table, but as questions can have many tags and tags can have many questions I'd guess that the relationship is like this:

   Question
   (TagId)         1 | Database triggers / referential integrity and in-memory caching
      |  
    -----
    | | |
  QuestionTag
 (QuestionId)       1 | 1  ... 1 | 2  ... 1 | 3 ...
    (TagId)
    | | |
    -----
      |
     Tag            1 | database ... 2 | referential-integrity ... 3 | triggers ...
   (TagId)

This kind of referential integrity is the bedrock of any reliable application and isn't negotiable. You can see how it adds credibility to the application design and confidence in its longevity.

The caching on SO may be turned on for such a thing as tags (although it isn't guaranteed) so assume that the tags are cached in memory and that you have enough reputation to be allowed to add a tag to SO. You add your tag and it may well be persisted to the database instantly - but is the cache then updated?

What you have is a trade-off. Can the site survive without knowing about your new tag? And if so for how long? Fundamentally what is the lifecycle of a tag, as it progresses from being added by a user to being in the database, available to other users, used by other users? The cache will be rebuilt according to the rules set down by the development team - and that rule will be a trade-off essentially so that any new tag is available quickly enough without slowing the application down.

Triggers can enforce referential integrity, say the tag you add is 'rubbish', but by the time the admins see it three questions are tagged 'rubbish'. The admins then decide to delete the 'rubbish' tag - but what about the questions that are tagged with it? If there is a trigger on the 'tag' table that is fired on the delete, it could then run round the 'question' table and remove all references to 'rubbish'. There are lots of alternatives to this approach - many of which are programmatic workrounds - but is there a cleaner alternative?

I've worked on lots of sites in the last 20 years, the good ones use referential integrity and increasingly caching. Triggers that change data anonymously (all they fundamentally are are event driven stored procedures) are not popular and increasingly misunderstood but still have a role.

Caching and referential integrity cannot be considered an either-or - development teams must design applications so that both can be incorporated.

amelvin
Wow, a well thought answer.However, I still can't understand how off-the-shelf caching solutions work with "behind the scenes" updates. Each change to the database referential integrity rules can potentially create a huge hole in the caching solution. How are these requirements (be fast - cache, be consistent - enforce ref. integrity) reconciled?
Ran Biron
Caching 'off-the-shelf' is a question of configuration, with database changes batched up and the cache itself knowing the referential integrity constraints in the database. This means that even cached data will reject attempts to break referential integrity.
amelvin
Really? off-the-shelf caching solutions know about referential integrity? I couldn't find any reference to this in the popular caching systems - can you direct me to one that does that?
Ran Biron
Which off-the-shelf caching system were you hoping to use?
amelvin
Actually, I'm hoping to create something that depends on existing data and is "live" (updates on data changes), so I'm exploring how cache manage to do it.
Ran Biron