tags:

views:

28

answers:

2

I've created an affilaite system that tracks leads and conversions. The leads and conversions records will go into the millions so I need a good way to store them. Users will need to track the stats hourly, daily, weekly and monthly.

Whats the best way to store the leads and conversions?

A: 

For this type of system, you need to keep all of the detail records. Reason being at some point someone is going to contest an invoice.

However, you should have some roll up tables. Each hour, compute current hours work and store the results. Do the same for daily, weekly, and monthly.

If some skewing is okay, you can compute the daily amounts off of the 24 hourly computed records. Weekly, off of the last 7 daily records. For monthly you might want to compute back off of the hourly records, because each month doesn't quite add up to 4 full weeks.. Also, it helps reduce noise from any averaging you might be doing.

I'd recommend a two step archival process. The first one should run once a day and move the records into a separate "hot" database. Try to keep 3 months hot for any type of research queries you need to do.

The second archive process is up to you. You could simply move any records older than 3 months into some type of csv file and simply back it up. After some period of time (a year?) delete them depending on your data retention agreements.

Chris Lively
Thanks Chris. I thought about storing the leads in one table and conversions in another and computing them from the time stamps, but if I continue to do this over 12 months the database loading times would get slower.I like your idea. All the affiliate systems I have worked with all have daily "maintenance" schedules, which I'm guessing is the same.Some do it weekly, some do it daily.
James Jeffery
I've got experience working with huge (300+ GB) high transaction databases doing 100 inserts a second. The ONLY way I've found that works is to do the rollups as a separate process. Also, as table size expands it becomes critical that you move all of the records you don't need elsewhere. Otherwise backups / restores take FOREVER.
Chris Lively
BTW, those databases aren't mysql, so I can't say what it is going to act like. However, you should be fine.
Chris Lively
I can see the sense in that. Thanks Chris. I'll do some research on this and implement it similar to the way you have described. You don't happen to have any resources on this topic do you?
James Jeffery
Sorry, I don't have any links to anything for this; but you should be able to figure it out.
Chris Lively
A: 

Depending on the load, you may need to have multiple web servers handling the lead and conversion pixels firing. One option is to store the raw data records on each web/mysql server, and then run an archival process every 5-10 minutes that stores them in a highly normalized table structure, and which performs any required roll-ups to achieve the performance you are looking for.

Make sure you keep row size as small as possible, store IP's as unsigned ints, store referees as INTs that reference lookup tables, etc.

Gary