views:

467

answers:

4

Hi, I'm developping a chat application. I want to keep everything logged into a table (i.e. "who said what and when"). I hope that in a near future I'll have thousands of rows. I was wondering : what is the best way to optimize the table, knowing that I'll do often rows insertion and sometimes group reading (i.e. showing an entire conversation from a user (look when he/she logged in/started to chat then look when he/she quit then show the entire conversation)).

This table should be able to handle (I hope though !) many many rows. (15000 / day => 4,5 M each month => 54 M of rows at the end of the year).

The conversations older than 15 days could be historized (but I don't know how I should do to do it right).

Any idea ?

+3  A: 

I have two advices for you:

  1. If you are expecting lots of writes with little low priority reads. Then you are better off with as little indexes as possible. Indexes will make insert slower. Only add what you really need.
  2. If the log table is going to get bigger and bigger overtime you should consider log rotation. Otherwise you might end up with one gigantic corrupted table.
Nadia Alramli
+2  A: 

Mysql does surprisingly well handling very large data sets with little more than standard database tuning and indexes. I ran a site that had millions of rows in a database and was able to run it just fine on mysql.

Mysql does have an "archive" table engine option for handling many rows, but the lack of index support will make it not a great option for you, except perhaps for historical data.

Index creation will be required, but you do have to balance them and not just create them because you can. They will allow for faster queries (and will required for usable queries on a table that large), but the more indexes you have, the more cost there will be inserting.

If you are just querying on your "user" id column, an index on there will not be a problem, but if you are looking to do full text queries on the messages, you may want to consider only indexing the user column in mysql and using something like sphynx or lucene for the full text searches, as full text searches in mysql are not the fastest and significantly slow down insert time.

Nathan Voxland
A: 

You could handle this with two tables - one for the current chat history and one archive table. At the end of a period ( week, month or day depending on your traffic) you can archive current chat messages, remove them from the small table and add them to the archive.

This way your application is going to handle well the most common case - query the current chat status and this is going to be really fast.

For queries like "what did x say last month" you will query the archive table and it is going to take a little longer, but this is OK since there won't be that much of this queries and if someone does search like this he would be willing to wait a couple of seconds more.

Depending on your use cases you could extend this principle - if there will be a lot of queries for chat messages during last 6 months - store them in separate table too.

Similar principle (for completely different area) is used by the .NET garbage collector which has different storage for short lived objects, long lived objects, large objects, etc.

devdimi
A: 

54 million rows is not that many, especially over a year.

If you are going to be rotating out lots of data periodically, I would recommend using MyISAM and MERGE tables. Since you won't be deleting or editing records, you won't have any locking issues as long as concurrency is set to 1. Inserts will then always be added to the end of the table, so SELECTs and INSERTs can happen simultaneously. So you don't have to use InnoDB based tables (which can use MERGE tables).

You could have 1 table per month, named something like data200905, data200904, etc. Your merge table would them include all the underlying tables you need to search on. Inserts are done on the merge table, so you don't have to worry about changing names. When it's time to rotate out data and create a new table, just redeclare the MERGE table.

You could even create multiple MERGE tables, based on quarter, years, etc. One table can be used in multiple MERGE tables.

I've done this setup on databases that added 30 million records per month.

Brent Baisley
Thank you, that's exactly what I was looking for.I'm planning to do this : each "new" month, I create a new table called CHAT\_YYYY\_MM : this will be the table where I'll do all the INSERTS.And I'll create a "global" table which will be a "Merge MySQL table" (destroy it first if it exists).Thank you again !
Olivier Pons
Don't destroy the merge table, just ALTER it.ALTER TABLE merge_table UNION=(new_table, table_1, ....);
Brent Baisley
Ok, thank you twice for the other hint :)
Olivier Pons