views:

90

answers:

5

Hi folks,

Firstly, this DB question could be a bit DB agnostic, but I am using Sql Server 2008 if that has a specialised solution for this problem, but please keep reading this if you're not an MS Sql Server person .. please :)


Ok, I read in a log file that contains data for a game. Eg. when a player connects, disconnects, does stuff, etc. Nothing too hard. Works great.

Now, two of the log file entry types are

  • NewConnection
  • LostConnection

What I'm trying to keep track of are the currently connected players, to the game. So what I originally thought of was to create a second table where it contains the most recent new connection, per player. When a player disconnects/loses connection i then remove this entry from this second table.

Eg.

Table 1: LogEntries
LogEntryId INT PK NOT NULL
EntryTypeId TINYINT NOT NULL
PlayerId INT NOT NULL
....

Table 2: ConnectedPlayers
LogEntryId INT FK (back to LogEntries table) NOT NULL

Then, I thought I could use a trigger to insert this into the cache data into the ConnectedPlayers table. Don't forget, if it's a trigger, it needs to handle multiple records, updates and deletes.

But I'm not sure if this is the best way. Like, could I have an Indexed View?

I would love to know people's thoughts on this.

Oh, one more thing: for simplicity, lets just assume that when a player drops connection/lags out/modem dies/etc, the application is smart enough to know this and DOES record this as a LostConnection entry. There will be no phantom users reported as connected, when they have really got disconnected accidently, etc.


UPDATE:

I was thinking that maybe I could use a view instead? (and i can index this view if i want to, also :) ) By Partitioning my results, I could get the most recent event type, per player .. where the event is a NewConnection or a LostConnection. Then only grab those most recent NewConnection .. which means they are connected. No second table/triggers/extra insert .NET code/whatever needed ...

eg..

SELECT LogEntryId, EntryTypeId, PlayerId
FROM
    (SELECT LogEntryId, EntryTypeId, PlayerId
         RANK() OVER (PARTITION BY PlayerId ORDER BY LogEntryId DESC) AS MostRecentRank
     FROM LogEntries
     WHERE (EntryTypeId = 2 -- NewConnection
            OR EntryTypeId = 4 -- LostConnection)
     ) SubQuery
WHERE MostRecentRank = 1

How does that sound/look?

A: 

Depending on the size of the original table LogEntries, this almost seems like overkill.

The triggers would have to update with each change to the original table, as where if using the correct indexing, a simple query could give you these results when you require the data.

I would thus go against the option of a secondary table.

astander
That's my worry too. This table IS large ... which is why i've asked the question :P If no to a secondary table .. then what other options do I have? Can this be handled in a single stored proc, instead?
Pure.Krome
What do you mean by **large**. This should be easily handled in a query if the table is correctly indexed.
astander
Correct re: Large. to keep this simple, lets say a 10 mil rows.
Pure.Krome
A: 

I'd make an is_connected flag, login_time and that's about it. You can use a simple MySQL query to check every 10/60 seconds even and cache the data in a file.

Where is_connected=1, order by login_time limit 10/20/100 ...

A second table seems way too much and pretty useless. Extra caching (if needed, in a huge database..) could be done on files.

Yossi
Files? er... this scares me mate. Sorry. I wish to keep this in the DB.
Pure.Krome
A: 

I personally would use a view that gets the most recent NewConnection or LostConnection(whichever is more recent) for each player, which implies you need some sort of date-time stamp in the log or an id that is ever increasing, and then filter that further throwing out all the LostConnection entries. This will leave you with all players having a NewConnection without a more recent LostConnection, hence they are connected.

The problem you might have with this approach is the log table might be huge. I would probably try performance testing with an index on the timestamp column or whatever column you use to determine what is the "most recent" entry.

AaronLS
If you were going to do a view, can you show me some sql code please? I'm not sure how to get the most recent NewConnection provided there is not a LostConnection before it.
Pure.Krome
A: 

As i understand, what you really need is

Player (ID) Game (ID) Connection (PlayerID, GameID, LastActivity DateTime) ..

@interestingTime is some time before current

select PlayerID, GameID
from Connection
where LastActivity > @interestingTime

gives you all currently connected players.

select PlayerID, GameID
from Connection
where LastActivity <= @interestingTime

gives you lost connections.

George Polevoy
This is a bit similar to my suggestion of having a second table. So in this case, how do you get data into your `Connection` table? Trigger? Also, does your code consider multiple connections/reconnections/disconnections in that time period?
Pure.Krome
+1  A: 

You don't need a second table, but you do need a date column, which I assume is part of your log data. I would normalize the data and avoid the temptation to optimize prematurely. Make sure you index the key columns, mainly the LogEntryDate and PlayerId columns in the case of your query.

Then, use a standard aggregate query to determine the newest log entry for each user, and then filter out the ones that are not connected. You could further optimize this by only selecting from log entries from the last 24 hours (or last week or whatever makes sense for your app).

select l.* 
from ( 
    select PlayerId, max(LogEntryDate) as MaxLogEntryDate
    from LogEntries
    where EntryTypeId in (2,4) 
        and LogEntryDate > GetDate() - 7 --only look at the last week, as connections older than that have timed out
    group by PlayerId
) lm
inner join LogEntries l on lm.PlayerId = l.PlayerId and lm.MaxLogEntryDate = l.LogEntryDate
where l.EntryTypeId = 2 --new connections only

If you find that you are still not getting the speed you want out of the query, then look at strategies for optimizing. You seem reluctant to cache in the application layer, so your proposal of indexed views would work. You could use the query above as a basis for this to create a Player view that includes a boolean IsConnected column.

Note: if you do not receive a date with each log entry but the LogEntryId is generated by the game, that should work as a substitute for the date. If you are generating the LogEntryId on insert though, I would caution against relying on that as it would only take one out of order import to throw off all of your data.

RedFilter
But, even if i did have this aggregate query, I would still only have a snapshot view of the data, for a period of time. Not the exact/accurate data. If i'm not mistaken, isn't this query just showing us the most recent connection, per player, for the most recent week? what if they disconnected AFTER this new connection? Does my `RANK() OVER (PARTITION ... )` seem more accurate?
Pure.Krome
It is showing data for the recent week, but that is only because I assumed they may timeout. Remove that optimization if that is not the case. Otherwise, the query shows upt to the minute, accurate results. Not sure what you mean by snapshot...
RedFilter
To clarify, the query gets the very latest log entry for each user (within the last week, but you can remove that where clause). Then, it filters out the ones that are not new connections, leaving you with the log entries for currently connected users.
RedFilter
Edited query to add EntryTypeId to where clause, otherwise it would miss some users...
RedFilter
ahh, so the MAX(..) is basically saying, grab me the most recent log entry result (ie. 1 record) which is either a 2 or 4 .. for the time period ... correct?
Pure.Krome
Exactly, the latest connection log entry type for each user.
RedFilter
sweet. that does make sense, now :) kewl.
Pure.Krome