views:

220

answers:

3

In my application I have a strong requirement for logging each event for the entity and I was considering using event sourcing pattern, i.e. all domain changes have explicit classes and any change to the domain objects can go only using these event classes. You can then rollback and reapply these changes as you want, just like in source control system.

This would solve many issues for me, but I do no know how to persist event objects to db. I will probably have hundreds of event types, so I have limited options:

  • build a table for each event type (hundreds of tables? what about references to entities?)
  • build a huge table for all events (with thousands of columns?)
  • somehow store a binary representation of event in db (??)
  • store it in some separate file (??)

Do you have any ideas how this can be done?

+1  A: 

There's an excellent real-life analog for this, which is an accounting system. Every professional accounting system is fundamentally based on journals of transactions which assert the context for every change in financial condition - equivalent to change of state in your entity.

I have used this pattern quite a bit, and it's usually a set of (not too many) tables with minimally a primary key for the table, a timestamp, and a username.

If you want to share your entity model a bit, we could discuss specific cases. But usually the structure of the tables drops out of the use cases associated with the real-life events being recorded.

A couple benefits -

  1. his is a good user-relations hook for your design, because it's one of the few tables in your database that is self-evidently self-explanatory. ("Yes, that's what people do and what needs to be recorded when they do it.")

  2. It builds in some real-life flexibility for dealing with transactions from several sources that might not be integrated in real-time, but you need to reconstruct the chronology. (E.g. shipping from Point A through B and C to D.)

le dorfier
+2  A: 

Your basic issue here is that you've got an entirely unrelational model that you're trying to match up to a relational database. That's just not going to work very well. So move up from the details for a second and think about your two basic choices for direction:

  1. You can attempt to build a more relational model. If you go this route, it's probably best to think about it in terms of the database itself, and more or less ignore the programming side for the moment: what schema best expresses your business domain?

  2. You can stick with the OO model, and use whatever for storage.

For this second option, you've got a variety of storage options available, and you should feel free to chose among them.

An RDBMS is one option, though your schema will be rather non-relational, and you won't be able to take advantage of some of the powerful relational tools that the RDBMS offers you. I wouldn't feel to bad about this: if you've considered both models, and decided to go OO, you've made that choice consciously. Most likely your schema might end up looking like a table of events, each with a type name, and a table of properties for each event in key-value form.

An object database will let you persist your stuff in more or less the same form you use it internally, which may be convenient. Note, though, that if you have performance issues, this option is probably the hardest to analyze and speed up.

A flat file is an interesting option: just serialize the objects into some reasonable form and write them out. This often can be the fastest way to do things, since it provides (especially if you gzip the files) one of the fastest formats for doing a full scan of the data. If you had particular queries you needed to do frequently that selected only a very small subset of the data spread out over the full set, using some sort of DBMS that could use an index for this could be helpful, but if you're in a position where you're scanning most of the time anyway, a DBMS will only slow you down. Note that you can (and probably will) divide things amongst one or possibly even two dimensions by putting them in different files. If you've got a dozen basic business areas in which you're recording events that you often query separately, you might use a separate file for each, and you might also roll these files over every month or year.

I often get a lot of pushback on this sort of thing, but as proven by the success of Unix's pipes, text processing tools and scripting languages, this stuff really works. The standard for a web server log is still a text format after fifteen years, and look at all the analytical tools it's spawned.

Once you're serializing anyway, you've got plenty of other options for storage. You could store your serialized chunk in Berkeley DB or a column in an RDBMS, too.

As to the serializationn itself, you've got various options here, too, that you should contemplate. Most languages have some sort of standard binary serialization format that serializes everything and brings back the full object. These are usually complex, and have versioning issues, and so on. I generally find using a simple custom format much easier. I could be as simple as an ASCII line with a name and a key-value list:

InvoiceCreate invoice_number=12345 date=2009-05-21 salesperson="Jill Gaines" ...

This has many advantages:

  • It's human-readable, and thus easy to debug.
  • Files like this can be processed with grep, sed, awk, etc. for ad-hoc queries.
  • It's human-writable, which is good for debugging, testing, and fixing broken data.
  • It's easy to parse.
  • It's not linked to any particular object structure, format, or even language, so you can change your application easily without worrying about compatability with your serialized data.
  • The data themselves can be easily updated (often with a simple sed script!) when you need to change the data format.

(And just to emphasize again, you would be surprised how fast grep can operate on files like this: unless you have gigabytes of data, or you need to do dozens of queries per second, this will probably provide all the performance you need.)

One last thing about this approach: it's a good, flexible way to experiement with what sort of entities and properties you need. While right now it seems you have hundreds of different things in the domain, you may find that after a few months of working away at it for a while your understanding has developed enough that you can model it in a much simpler fashion. If you reach that point, you can consider switching over to an RDBMs if that now suits your model better.

That can be used as a selling point for this, too: if you end up with someone objecting to you not using an RDBMS (whether you need it or not), just sell this as the "experiementation phase" while you work out the model, and tell them you've move once the model has settled down. Even if it doesn't move, once you've got a well working system, they're not likely to put a lot of pressure on you.

Curt Sampson
I like storing my binary objects in the database, then I pull out specific values that i will index and query on. Works great.
JoshBerke
That's an excellent idea, Josh.
Curt Sampson
@Curt: Thanks for the answer, you provided some good ideas I didn't think about, especially your ideas on serialization.@Josh: Nice approach, especially if you combine it with Curt's.
bbmud
A: 

another option is to store the relevant keys and common search values in regular columns, and put the rest in an XML column (or equivalent formatted text e.g. JSON or whatever works for your app)

this assumes that most of the time all you need to do is reconstitute the original event from the database (like serialization/deserialization) and not search (efficiently) on every possible property

Steven A. Lowe