views:

131

answers:

4

Hello guys, hope you all had a happy new year.

So, my question is, what's the best way to make a log of actions. Let me explain it with a example, suppose we have these entities:

User

Friend (User is a friend of another User, many to many relationship)

Message (An user can message another user)

Group (An user can be in various groups)

Game (A game can be played with various players, has some info like date of the game. this results in two tales, games and games_users, the latter stores a relationship between user and a game)

Now, I wanted to make a log, for example:

  1. User A (link to user) made a new friend, User B (link to user)

  2. User A (link to user), B (link to user) and C (link to user) played a game (link to game)

  3. User C (link to user) joined a group D (link to group)

So, I wanted to make a flexible log, that could store as many references as I wanted and references to different entities (user and game for example).

I know two ways of doing this, but they all have one or more problems:

  1. When logging an action I directly store the pure text I want (i.e: only 1 char field, which would store 'User C joined a group'). But, there is a problem this way, this text needs to be translated to other languages and I can not have a field for each language.

  2. Having a main table log, which each rows represent a log action and a code so I know which action is that, i.e: an user joined a group, x users played a game. I then have another table for each of the foreign key types needed, so I'd have log_user, log_group and log_game For example, log_user with a field referencing log and another referencing user. This way I can have multiple users for a same log action. Problems: rather complex and could result in substantial overhead as depending of the log action I'd have to query to multiple tables. Is this correct, would it be too cpu-intensive?

So, I'm open to new ideas and brainstorming. What's the best approach for this kind of problem? Thanks in advance, I hope I have explained it in a clear way. If there is any question, please ask.

Edit: I decided to start a bounty as I'm not really happy with the answers I have received. Will make any clarifications if needed. Thanks

I want something very similar to facebook/orkut/social networks "friend updates". This will be displayed to users.

+1  A: 
  1. Keep it simple and extendible
  2. Don't let translation overhead affect performance, translation only needs to be done for output purposes.

Suggestion:

 LogId       DateTime   Action   Role  Entity

 e.g.

 30303     1/1/10    43        Sender   John
 30303     1/1/10    43        Receiver Sam
 30304     1/1/10    44        Game      game43
 30304     1/1/10    44        Player    Sue
 30304     1/1/10    44        Player    Mike

(In the above table, "Message", ,"Sender", "John", "game43" etc. would not be text but would be foreign keys in either the action, role, or entity table. I've written the keys for "Action" but not for "Role" or "Entity" but they would be keys as well.

Now, instead of text action, Role, Entity you might have keys in there, and store them in a separate table. This can be used for output, e.g.

 Action Table

 Id    ActionKey   Text      Language
 1     43          JoinGame  English
 2     43          Jeu       French
 3     44          Message   English
 ...
 ...

 Role Table
 Id   RoleKey  Text  Language
 1    1        Sender   English
 2    1        Sendeur  French        (I don't know french :)
 ....

 Entity Table
 EntityKey   Text
 1           Sam
 2           game43
 3           Sam

Note that in the entity table, 2 or more entries might have the same text representation, as there could be more than 1 user named Sam. If you want to represent different, orthogonal information about each entity, then you can include the EntityKey in the correspondig table, e.g.

Person Table
Id    EntityKey   FirstName  LastName ....
1      1          Sam         Johnson

Game Instance Table
Id    EntityKey   GameType  
1     2           444

Game Table
Id   Name   MaxPlayers ...
444  Quake    10

Basically we are mapping a predicate (in the first-order predicate logic sense, a set of tuples) to binary relation form (by creating an artificial entity, the action key). Thus, the Entity table basically contains various columns/arguments of the relation, and so it can be practically anything which could be an argument of the relation. The advantage of this representation is it is infinitely extendible to new relations that you may wish to log without changing the schema. Everything in the ActionTable should be a key or foreign key, I just haven't put that in there because it may be harder to read.

Larry Watanabe
Hi Larry, thank you for your answer! On your example, how do I know the message was sent from John to Sam? There will be other rows with the role 'Sender' and 'Receiver'. How can entity reference an user and in some cases a game?Thanks
Clash
Good Point. The table needs a LogId column to identify all entries for a particular log entry, since an entry can span several lines. I've modified the original post to include this column.
Larry Watanabe
Larry, so you suggest I don't use foreign keys for entities? Because it seems like your entity field can have game and users
Clash
+1  A: 

Do you need this for logging/tracking purposes, or for display to users and admin? If your use for logging/tracking (i.e. computer readable), you should probably separate your logging into multiple tables like you specified.

However, if you want this for your users or display on screen, why not just store it in basic html? This way you can easily display it on screen and view.

For example, "User A (link to user), B (link to user) and C (link to user) played a game (link to game)" would be

<a href="/users/showuser.php?id=2341">User A</a>
, <a href="/users/showuser.php?id=311">User B</a>
, and <a href="/users/showuser.php?id=89">User C</a>
played a game of <a href="/games/gameoverview.php?id=3">Chess</a>.
Jess
Hi, thanks for the answer Lucky. The problem with storing direct HTML is that it needs to be translated
Clash
Translated into what? If you are displaying it to users or in your own log, it won't need to be translated at all. It only needs to be translated if you are using it for other features than logging.
Jess
What? The guy is french and doesn't know english, he will never understand 'played a game'
Clash
+2  A: 

My answer to Whats a better strategy for storing log data in a database?:

It depends on the purpose of logging. For debugging and general monitoring purpose, a single log table with dynamic log level would be helpful so you can chronologically look at what the system is going through.

On the other hand, for audit trail purpose, there's nothing like having duplicate table for all tables with every CRUD action. This way, every information captured in the payment table or whatever would be captured in your audit table.

So, the answer is both.

Edit: To implement it cleanly with referential integrity and have all the flexibility, I suggest having duplicate audit trail table for all CRUDs for each table even if it's "heavy." The business rules are more volatile compared to data structures anyway, so by keeping log logic in code/query you retain the flexibility. For example, suppose you decided not to track when the users left a group. Later day, the clients asked that it's very important to track the information. All you have to do now is change the query so deletion of user_group record is part of the result.

eed3si9n
Thank you for your reply! I don't see how a single table can fulfill all my needs, it needs to be translated and it needs to contain links.Having a table for each log action type would be too heavy for querying to display for website visitors? I mean, I'd need to query all of the tables to construct the log of the latest actions of a certain user.
Clash
+1  A: 

The following is how I would do it. I have some more comments at the bottom after you have seen the schema.

Log

LogID - unique log ID

Time - date/time of event

LogType - String or ID

(side comment, I would go with an id here so you can use a message table shown below, but if you want quick n dirty you can just just a unique string for each log time (eg "Game Started", "Message Sent", etc)

LogActor

LogID - external key

LogActorType - String or ID (as above, if ID you will need a lookup table)

LogActorID - This is a unique id to the table for the type eg User, Group, Game

Sequence - this is an ordering of the actors.

LogMessage

LogType - exernal key

Message - long string (varchar(max)?)

Language - string(5) so you can key off different language eg "US-en"

Example Data (using your 3 examples)

Log

ID  Time   LogType 
1   1/1/10 1
2   1/1/10 2
3   1/1/10 3

LogActor

LogID LogActorType LogActorID Sequence
1     User         1          1
1     User         2          2
2     User         1          1
2     User         2          2
2     User         2          3
2     Game         1          4
3     User         3          1
3     Group        1          2

LogMessage

LogType Message 
1       {0} Made a new friend {1}
2       {0}, {1}, {2} played a game ({3})
3       {0} joined a group ({1})

User

ID Name
1  User A
2  User B
3  User C

Game

ID Name
1  Name of game

Group

ID Name
1  Name of group

So here are the nice things about this design.

  • It is very easy to extend

  • It handles multi-language issues independent of the actors

  • It is self documenting, the LogMessage table explains exactly what the data you are storing should say.

Some bad things about it.

  • You have to do some complicated processing to read the messages.

  • You can't just look at the DB and see what has happened.

In my experience the good parts of this kind of a design outweigh the bad bits. What I have done to allow me to do a quick n dirty look at the log is make a view (which I don't use for the application code) that I can look at when I need to see what is going on via the back end.

Let me know if you have questions.

Update - Some example queries

All of my examples are in sqlserver 2005+, let me know if there is a different version you want me to target.

View the LogActor table (There are a number of ways to do this, the best depends on many things including data distribution, use cases, etc) Here are two:

a)

SELECT 
  LogId,
  COLLESCE(U.Name,Ga.Name,Go.Name) AS Name,
  Sequence
FROM LogActor A
LEFT JOIN User U ON A.LogActorID = U.[ID] AND LogActorType = "User"
LEFT JOIN Game Ga ON A.LogActorID = Ga.[ID] AND LogActorType = "Game"
LEFT JOIN Group Go ON A.LogActorID = Go.[ID] AND LogActorType = "Group"
ORDER BY LogID, Sequence

b)

SELECT 
  LogId,
  U.Name AS Name,
  Sequence
FROM LogActor A
INNER JOIN User U ON A.LogActorID = U.[ID] AND LogActorType = "User"
UNION ALL
SELECT 
  LogId,
  Ga.Name AS Name,
  Sequence
FROM LogActor A
INNER JOIN Game Ga ON A.LogActorID = Ga.[ID] AND LogActorType = "Game"
UNION ALL
SELECT 
  LogId,
  Go.Name AS Name,
  Sequence
FROM LogActor A
INNER JOIN Group Go ON A.LogActorID = Go.[ID] AND LogActorType = "Group"
ORDER BY LogID, Sequence

In general I think a) is better than b) For example if you are missing an actor type a) will include it (with a null name). However b) is easier to maintain (because the UNION ALL statements make it more modular.) There are other ways to do this (eg CTE, views, etc). I'm inclined to doing it like b) and from what I've seen that seems to be at least standard practice if not best practice.

So, the last 10 items in the log would looks something like this:

SELECT 
  LogId,
  M.Message,
  COLLESCE(U.Name,Ga.Name,Go.Name) AS Name,
  Time,
  A.Sequence
FROM Log
LEFT JOIN LogActor A ON Log.LogID = A.LogID
LEFT JOIN User U ON A.LogActorID = U.[ID] AND LogActorType = "User"
LEFT JOIN Game Ga ON A.LogActorID = Ga.[ID] AND LogActorType = "Game"
LEFT JOIN Group Go ON A.LogActorID = Go.[ID] AND LogActorType = "Group"
LEFT JOIN LogMessage M ON Log.LogType = M.LogMessage
WHERE LogID IN (SELECT Top 10 LogID FROM Log ORDER BY Date DESC)
ORDER BY Date, LogID, A.Sequence

NB - As you can see, it is easier to select all log items from a date than the last X, because we need a (probably very fast) sub-query for this.

Hogan
Hogan, thank you for reply. +1A couple dobts, is there anyway I can still have referential integrity?If I want to select the last 10 logs, I'd need how much queries? Because from what I see, I don't know before selecting `LogActor` which table I will need to gather info from. For example, like you said`LogID LogActorType LogActorID Sequence``1 User 1 1`I will only find out that I need to query the user table after selecting those rows. How do you suggest I deal with this? A lot of queries?Thanks
Clash
Clash: see edit above. I think this addressed most of your questions.
Hogan
Wow! I had no idea I could do `AND LogActorType = "User"` on a JOIN clause. Amazing! Thanks a lot for the help. Will try it later
Clash
Yes, the JOIN clause is mighty. Not knowing how to use it is the reason for most of the mistakes I see on SO (often using a sub-query instead of a join).
Hogan