ansaurus

Question

Design debate: what are good ways to store and manipulate versioned objects?

Answer 1

+1 A:

Hmm, sounds kind of like this site...

As far as a database design would go, a versioning system kind of like SVN, where you never actually do any updates, just inserts (with a version number) when things change, might be what you need. This is called MVCC, Multi-Value Concurrency Control. A wiki is another good example of this.

Eric Z Beard 2008-08-14 20:57:36

Answer 2

+1 A:

@Gaius

foreign key (thing_id, thing_type) -> problems.id or solutions.id

Be careful with these kinds of "multidirectional" foreign keys. My experience has shown that query performance suffers dramatically when your join condition has to check the type before figuring out which table to join on. It doesn't seem as elegant but nullable

problem_id and solution_id

will work much better.

Of course, query performance will also suffer with an MVCC design when you have to add the check to get the latest version of a record. The tradeoff is that you never have to worry about contention with updates.

Eric Z Beard 2008-08-15 13:22:01

Voted up as those foreign keys that go to various tables so confuse the optimizer. It's also bad when you delete a parent record and for RI.

WW 2008-10-31 10:04:37

Answer 3

A:

I suppose there's

Option 4: the hybrid

Move the common Thing attributes into a single-inheritance table, then add an custom_attributes table. This makes foreign-keys simpler, reduces duplication, and allows flexibility. It doesn't solve the problems of type-safety for the additional attributes. It also adds a little complexity since there are two ways for a Thing to have an attribute now.

If description and other large fields stay in the Things table, though, it also doesn't solve the duplication-space problem.

table things
  int id | int type | string name | text description | datetime created_at | other common fields...
  foreign key type -> thing_types.id

table custom_attributes
  int id | int thing_id | string name | string value
  foreign key thing_id -> things.id

James A. Rosen 2008-08-15 14:19:17

Answer 4

+1 A:

How do you think about this:

table problems
int id | string name | text description | datetime created_at

table problems_revisions
int revision | int id | string name | text description | datetime created_at
foreign key id -> problems.id

Before updates you have to perform an additional insert in the revision table. This additional insert is fast, however, this is what you have to pay for

efficient access to the current version - select problems as usual
a schema that is intuitive and close to the reality you want to model
joins between tables in your schema keep efficient
using a revision number per busines transaction you can do versioning over table records like SVN does over files.

2008-10-18 08:55:54

Answer 5

A:

It's a good idea to choose a data structure that makes common questions that you ask of the model easy to answer. It's most likely that you're interested in the current position most of the time. On occasion, you will want to drill into the history for particular problems and solutions.

I would have tables for problem, solution, and relationship that represent the current position. There would also be a problem_history, solution_history, etc table. These would be child tables of problem but also contain extra columns for VersionNumber and EffectiveDate. The key would be (ProblemId, VersionNumber).

When you update a problem, you would write the old values into the problem_history table. Point in time queries are therefore possible as you can pick out the problem_history record that is valid as-at a particular date.

Where I've done this before, I have also created a view to UNION problem and problem_history as this is sometimes useful in various queries.

Option 1 makes it difficult to query the current situation, as all your historic data is mixed in with your current data.

Option 3 is going to be bad for query performance and nasty to code against as you'll be accessing lots of rows for what should just be a simple query.

WW 2008-10-31 09:50:35

ansaurus

tags:

views:

answers:

Design debate: what are good ways to store and manipulate versioned objects?

1: Problems (and separately, Solutions) are self-referential in versioning.

2: Create a new Relationship type: Version.

3: Use a more Subversion-like structure; move all Problem and Solution attributes into a separate table and version them.

Option 4: the hybrid

related questions