views:

32

answers:

2

Simplified case:

Storage

  1. Users click on a link
  2. link_clicks +1 for each click
  3. Super user sets a multiplier parameter for each click
  4. link_reward (+1 * param) for each click
  5. User ID is also recorded for each click

Retrieval

  • Queries must be done on a specific date range (e.g., "How many click between Oct 10 and Oct 23 for User ID = 4")
  • Most queries will be done, however, on the sum of all dates for a given user

Assuming the table gets massive, both kinds of queries will get very slow right?

How does one handle this? Simultaneously store in one detailed table (a row per click per user per link) and in one summary table (a row per user per link)? I've heard of "rolling up" the data but I don't know what that means.

Technologies used: MySQL, PHP (and Javascript)

+1  A: 

How does one handle this? Simultaneously store in one detailed table (a row per click per user per link) and in one summary table (a row per user per link)?

Yes, but add a DATETIME column so you can do the period checking you mentioned in (a). Populate the DATETIME column using the NOW() function to get the current date & time. Something to mind about option (a) is that the criteria will minimize the data being summarized, so performance shouldn't be too big of a concern. Also, the details table probably shouldn't be indexed because indexes only help getting data out, and slow down putting data into a table.

The super user reward should probably be a separate table, but this means that your details table needs to relate to the super user either by their userid or URL. userid would be a better choice.

I've heard of "rolling up" the data but I don't know what that means.

A database principle is to store only what you need--summary data can be calculated using functions like SUM and COUNT. You can create a view, which can be queried like a table but it doesn't store any data.

OMG Ponies
Thanks @OMG Ponies! This is an excellent answer! One follow-up question to the summary data. I have to call a SUM for approx. 30 different links to populate a data table. If I do it the way you say, about how many records before I have to start worrying about "unreasonable" delays? Thousands? Millions? Bazillions?
Emile
@Emile: Databases easily handle millions of records. It depends more on the data type of what you are performing operations on, and what you're trying to get out of that information.
OMG Ponies
Thanks @OMG. The 30 different SUMs would be performed on the DATETIME and a INT column. But what you're saying basically is "no worries" Thanks for all the help!
Emile
+1  A: 

Easy. :-)

One table for users, I'd call it User.

One table for clicks, I'd call it ClickEvent.

One table for each distinct link, I'd call it HyperLink (avoiding the word "Link" in the DB)

The User table, given what we know (not much), isn't very material to question or answer.

The HyperLink table will be the place to store all info on each link, columns being:

  • HyperLinkID
  • URL
  • ClickValue
  • RewardMultiplier

(I think what you stated assigns the value and the multiplier to the link entity, not to every single click event, right?)

The ClickEvent table is central to your question/answer. I'd give it columns as follows:

  • ClickEventID, int (PK)
  • UserID, int (FK)
  • HyperLinkID, int (FK)
  • ClickDateTime, datetime
  • ComputedEventValue (decimal or smallmoney)

Your concerns for speed should be muted -- this isn't very intensive, even with a lot of activity. Every transaction (a click) is logged in the ClickEvent table. Every click event inserts a new record, and during the time of insert, the ComputedEventValue gets written.

That seems to be cover the main idea as I see it.

Chris Adragna