views:

104

answers:

4

For a personal project, I want to record a stream of unrelated events, to gather simple statistics on later. Each event can have [wildly] different data associated with it.

I'm looking for some advice on how to structure this. As this is a small project, my main goals are:

  • Ease of setup
  • Ease of use (i.e. no strange joins)

Performance isn't critical, and I will be the only user.

To give you an idea of what I'm aiming for, here is my current plan:

Table: Event_Definitions

Columns:

  • ID
  • Name
  • Type
  • Options

Table: Events

Columns:

  • ID
  • Definition_ID
  • Option_Values
  • Notes

So say we have two event definitions, like this:

ID: 0; Name: Pigeon_Released; Type: Time; Options: null

ID: 1; Name: Fed_Pigeon; Type: Fixed_List; Options: bread,crackers,tofu

Then we log some events:

ID: 0; Definition_ID: 1; Option_Values: bread; Notes: null

ID: 1; Definition_ID: 1; Option_Values: tofu; Notes: "he cooed"

ID: 2; Definition_ID: 0; Option_Values: 12:34:56; Notes: "I cooed too"

The option values will be enforced through the program.


After a number of events have been collected, I will be collecting together events of the same type for comparison. I don't expect to be retrieving events with specific values, just events of the same type.


So the question is, again, any advice or comments on this strategy or alternatives? I appreciate how simple and straightforward this approach is, but it bothers me that even though the 'values' for an even could be strings, times, index numbers, etc. etc., they all get stored in the same column.

+1  A: 

Effectively you've got a "blob" of data that could in principle hold anything, and then a schema definition to say how the blob should be interpreted

it's a date

or

it's one of these values

This is quite reasonable for a personal project. A downside of the "blob" approach is that there can be something of an impedence mismatch when doing queries. If you need to work with the blob contents it's going to be a little cumbersome.

Another comment is that your defintion table might become restrictive, you want to put some more sophisticated, structured data in your blob.

I wonder if using XML as your blob helps. The XML schemas then define the blob contents, maybe you don't need your event definition table at all. Does your database (as some do) have any XML capabilities you could exploit?

djna
I like the idea of using XML schema to define the options, but I don't foresee any of the options having much complexity. All the ones I have planned so far are either single-value, or multiple values of the same type, or key/value pairs.
Ipsquiggle
I think you're right. But if your DB "gets" XML you might find it useful anyway.
djna
I'm using SQLite via Python, which doesn't seem to have this functionality. I'm picking this answer as the suggestions seem to most closely match the scope of the project.
Ipsquiggle
A: 

An RDBMS is not the best technology solution to use as an event stream, and your requirements for variable attributes does not fit the relational paradigm.

Try using a Message Queue instead.


Okay, if you have the need to store and compare events, that's different. When I hear "event" I assume it's just for real-time notification. So my suggestion of a Message Queue may not be appropriate in this case.

Still, relational databases don't support variable attributes in a table easily. You can try using design patterns like Concrete Table Inheritance or Class Table Inheritance to solve your problem.

Bill Karwin
Message queue makes sense to me for storage, but is it more effective for retrieval? I will be collecting together all the events of the same type for comparison.
Ipsquiggle
I had considered putting each type of event into it's own table (seems to be the simplification of what Table Inheritance offers), but I expect a fairly large number of event definitions, and possibly creating new definitions during normal use. In any case, I'd never heard of Table Inheritance, I'll be filing that one away for later use. :)
Ipsquiggle
A: 

Most event-steam apps use XML to define event-objects. Though many do have SQL-like language, they do not use underlying RDBMS. You may want to check out Marco of ruleCore who went through the whole exercise of designing their own CEP app and blogged about it.

Even stream engines like StreamBase and Coral8 require schema for the event-streams at design time, so even they may not fit your requirement for "wildly different data".

So, at the end you may need to use something like:

<event>
    <head>
        <id>12784536</id>
        <type_id>51</type_id>
        <time_stamp>2008-12-11T13:25:57.014Z</time_stamp>
        <source_id>862</source_id>
    </head>
    <body>
        <!-- Event specific data here -->
    </body>
</event>
Damir Sudarevic
I've updated the question to clarify (hopefully): The goal of the DB is not for sending events, but for recording them for stats-gathering later on. Event stream probably wasn't the right term to use.In any case, those 'stream engines' look like they'd be overkill for this project.
Ipsquiggle
A: 

I have tried a small project with rulecore to develop a system that is a bit like yours. I used a mysql db to store a stream of events and then sent them in batch to rulecore where I had created about 20 rules. The rulecore event format is very simple with named properties which can contain just about anything. I did this too as my first try with sql queries resulted in a complex schema and very long and hard to understand queries. The rulecore rules was much simpler.

Stjoan
There seem to be a few things called 'rulecore'. Could you provide a link?
Ipsquiggle
you can find it at rulecore.com
Stjoan