views:

86

answers:

6

Say there is a website with 100,000 users each has up to 1000 unique strings attached to them so that there are maximum 100,000,000 strings in total. Would it be better to have 1 table with each string being one record along with it's owner's id. So that you end up with 1 table with 100,000,000 records with 2 fields (text and user id).

Or have 100,000 tables, one table for each user and the table's name is the user's id. and then 1000 records in each table, with just one field (the text).

Or instead of storing the strings in a database (there would be a character limit about the length of an SMS message) just store link to text files where there are 100,000,000 text files in a directory and each file has a unique name (random numbers and/or letters) and contains one of the strings? (or where each user has a directory and then their strings are in that directory?)

Which would be the most efficient option, the directory and database and then which sub option of those would be the most efficient?

(this question is obviously theoretical in my case, but what does a site like twitter do?)

(by efficiency I mean using the least amount of resources and time)

+2  A: 

Option #1

It would be easier to store one table with a user id and the text. It would not be more efficient to create a table for every user.

Though in practice you would want something like a Mongo sharded cluster instead of a lone server running MySQL.

Josh K
+2  A: 

You'd have one table, with indexes on the USER_ID.

For speed, you can partition the table, duplicate it, use caching, cloud, sharding, ...

Konerak
+10  A: 

Or have 100,000 tables

For the love of $DEITY, no! This will lead to horrible code - it's not what databases are designed for.

You should have one table with 100,000,000 records. Database servers are built to handle large tables, and you can use indexes and partitioning etc to improve performance if necessary.

Blorgbeard
good because this is also what I thought, I just wanted to make sure.
Jonathan
Actually, depending on how the data is structured and what it's used for, a few dozen tables (certainly not 100,000) would be reasonable. It's called horizontal partitioning, and mysql 5.1 supports it natively. It will handle all the table references for you so you can treat it as one table. But there are some strict conditions on using it.
Tesserex
A: 

Please consider NoSQL databases: http://nosql-database.org/

Pierre 303
A: 

Definitely one table, and fill with record based on key. OS will crawl with a directory structure of 100,000 file names to sort through... the directory mgmt alone will KILL your performance (from the OS level)

DRapp
A: 

It depends on how much activity the server has to handle.

A few month ago we build a system that indexed ~20 million Medline article abstracts which each are longer than your twitter message. We put the stuff in a single lucene index that was ~40GB big. Even through we had bad hardware (2 GB Ram and no SSD drives - poor interns) we were able to run searches for ~3 million terms in a few days against the database.

A single table or (lucene index) should be the way to go.

Christian