tags:

views:

59

answers:

2

What is the best database model to store user visits and count unique users using the IP in a big database with 1.000.000 rows for example?

SELECT COUNT(DISTINCT ip) FROM visits

But with 1.000.000 diferent ip's it can be a slow query. Caching will not return the real number.

How big stats systems counts uniques visits?

A: 

Don't use a relational database for that. It's not designed to store that type of information.

You can try a NoSQL database such as Mongo (I know a lot of places use that for their logging since it has so little overhead).

If you must stick with MySQL, you can add an index to the ip column which should speed things up significantly...

ircmaxell
That's what I would suggest. Also, think about the concept of calculating unique users. Calculate it just once and then reuse it. Number of yesterdays unique visitors will not change. Number of unique visitors last week will not change either..
dwich
Based upon that, you could shard per day/week/month/whatever, and create a new table for each new period. That way you still retain the information (if you **really** need it), and get the performance gain of dealing with relatively small tables. But I must ask, why do you need to retain that much data? Why not just summarize once per day and then delete after a month or two?
ircmaxell
@ircmaxell: I know how to use indexes... I'm only asking for DB models for high populated databases. I need to save all data because my framework needs all information of all clients in differents servers for stadistics and other things. Thanks(What's faster, INDEX IP, or the another solution, make a table with unique ip's?)
Wiliam
Well, it all depends on the server. Don't forget that this table will be extremely write heavy (assuming it's being written to live). So the `on duplicate key update` may have a performance hit since it needs to read the index, and then seek to the position where insert would just need the seek (sounds like a tiny bit extra, and it is for one query. For thousands per second it's significant. Plus it enables writes to be streamed back to back rather than requiring seeks all over the place. Bottom line: test it. Make a test db, and a script to write to it and see...
ircmaxell
+1  A: 

Have another MyISAM table with only IP column and UNIQUE index on it. You'll get the proper count in no time (MyISAM caches number of rows in table)

[added after comments]

If you also need to count visits from each IP, add one more column visitCount and use

INSERT INTO 
  visitCounter (IP,visitCount) 
VALUES 
  (INET_ATON($ip),1) 
ON DUPLICATE KEY UPDATE 
  SET visitCount = visitCount+1
Mchl
@Mchl if the IP column is UNIQUE wont that table always return COUNT = 1 per IP?
Frankie
It will, but I understood that William wanted to count the number of all distinct IPs. This can still be modified by adding a `count` field and using `INSERT ... ON DUPLICATE KEY UPDATE ... ` syntax to increment it.
Mchl
For unique visits it's a good solution. Save unique IP and the actual timestamp
Wiliam
@Mchl: Storing IP as integer is better than storing it has string? Faster?
Wiliam
Absolutely. It's storing a 4 byte integer instead of up to 15 bytes.... Plus it doesn't require the charset and collation routines for searching and retrieving...
ircmaxell
Yeah. Just don't do `WHERE INET_NTOA(ip) = '127.0.0.1'` but do `WHERE ip = INET_ATON('127.0.0.1')`. The difference is: second one uses index.
Mchl