views:

65

answers:

3

Hi, I have a curious question... I wanted to know how to maintain chat data in a database. I have been using a php-mysql application, that stores chat data of users in a database.

Now my question is that, if the chat data increases, say, to some millions of records, how to store it? Does mysql support it, or have any limitations ?

Take the example of gmail chat. I can chat unlimited and can also retrieve all my previous chat data. How is it possible ?

Can anyone answer this typical question of me ?

A: 

Google has vast amounts of custom storage designed by it for its requirements. What I suggest is you determine your requirements more concretely and determine the platform you need.

Preet Sangha
My requirements are such that, a user must be able to retrieve his chat data for atleast 1 year. My platform is php-mysql-apache-windows.
dskanth
how many users? What will be the average amount of data - and how will it grow - etc.
Preet Sangha
The user count would be at the maximum 1 lakh and it would not grow to that level fastly. It takes almost upto 1 year to become 1 lakh.And the amount of chat data for each user could be 40kb per day.
dskanth
A: 

MySQL will happily store millions, even billions of records; but some of the numeric types won't be enough: see this for the maxima of numeric types. As you can see, it would be better to use BIGINT UNSIGNED for e.g. autoincrement fields.

Performance may become a problem for large tables, but that can be mostly solved with indexes (meaning "I've seen performance decrease somewhere around the 100GB mark in a similar situation").

Piskvor
Thanks for your answers... initially what i thought is that i can store enormous data in mysql, but as the data and records increase, the retrieval would become slow. Iam planning to start with a 20GB storage and as the users and chat data grows in future, i would consider increasing disk space.Also, i think it would be better if i can periodically delete the chat data older than 1 or 2 years, using a cron job.
dskanth
As Piskvor already said, as long as you keep an eye on that documentation, and maintain indices in a way that actually helps the database find records, you'll be more than fine.I'm maintaining a MySQL database with currently around 1 million records, and I can do fulltext searches in them in fractions of a second.I'm sure finding sequential data like chat logs will be even faster.
Michael
Hi Michael, How to maintain indices in a way that actually helps the database find records? I did not understand it well. Do you mean indexes on a table? If so, how to write indices on a chat table ?
dskanth
@dskanth: The indexes you need depend on your specific situation; see e.g. this tutorial: http://www.databasejournal.com/features/mysql/article.php/1382791/Optimizing-MySQL-Queries-and-Indexes.htm
Piskvor
Off the top of my head I would propose the following table, only for the logs: `SendingUserID(PK, Index1), ReceivingUserID(PK), Timestamp(PK, Index1), Message`. This, of course, requires the timestamp to be unique, which, I think, is given in chat situationes (per user-pair). In this case, you probably wouldn't even need the extra indices, if you're only looking at queries per user-pair and timestamp. But I would leave them since maybe your users want to look at all their conversations in a given time span.I'll leave the rest up to you; DBMS is a huge topic and I have no more characters left.
Michael
+3  A: 

Chat history isn't really that heavyweight. If I calculate around 100 bytes per message, 6 messages per minute, and 5 hours per day, (that is a very talkative chatter, though), permanently, as a worst case, that would give about 61MB per user per year (!). That means with 1 million talkative chatters (very unprobable) you would need around 58TB or data storage.

Saying that this is a worst-case calculation, I would start off with a maximum of 1TB storage, set up the database, and see how things are going. It is highly unprobably for a very young service to evolve that fast.

Also, I would personally not recommend using a Windows system for something like this, unless you know very well what you're doing. MySQL on a Debian distribution will store billions of records, and probably do this faster due to less OS-level limitations (see the MySQL documentation for details, there should be section about the limitations on Windows).

Michael