views:

818

answers:

6
+2  Q: 

Large MySQL tables

For a web application I'm developing, I need to store a large number of records. Each record will consist of a primary key and a single (short-ish) string value. I expect to have about 100GB storage available and would like to be able to use it all.

The records will be inserted, deleted and read frequently and I must use a MySQL database. Data integrity is not crucial, but performance is. What issues and pitfalls am I likely to encounter and which storage engine would be best suited to the task?

Many thanks, J

+1  A: 

If you are using indexing (and even if you're not) you may encounter scaling issues. You can try partitioning to try to reduce those effects.

In my own project, integrity is not crucial but performance is as well. What we did was relax all the transactional requirements, relax disk synchronization requirements, and commit batch inserts and we really improved our write speeds.

Also, make sure you do your own testing to tune your memory sizes. I believe MySQL has a few different types of caches of which you can configure the size.

jbu
+1  A: 

You definitely want to use MyISAM for the storage engine. But you say you expect 100 GB and it will only contain a short-ish string value. You definitely want to use a 64-bit int for your identity/primary key.

But my real question is. Are you using this to store session information from the web site? If so you want want to use memcache instead of MySQL.

Nick Berardi
It's not for session information. The values are derived from urls scraped off webpages. Sorry - can't reveal more, but thanks all the same!
Snukker
MyISAM locks the whole table on updates, inserts or deletes and thus is a very poor choice for write-heavy use. The OP says he is doing lots of updates, inserts or deletes. If you have to use MySQL, InnoDB is better for read/write because it doesn't lock the whole damn table for writes.
Cory R. King
A: 

large MySQL queries make my Quad Core/8GB Ram DB Server crash...

solution is to use PostgresSQL (SQL Server if you can afford it)

mson
iAn
mysql was intermittently crashing on queries (sometimes the business logic is really that complex). moving the system to sql server on a comparable machine with the exact same queries results in no more crashes... and even faster execution. go google the current bad state of mysql...
mson
+2  A: 

It all depends the read/write pattern your application is generating, and the level of accuracy you want to get. For exemple, if you don't really care having all the last inserted rows immediately available, consider using INSERT LOW PRIORITY can help SELECTs. If the text size is relatively small, you may use a fixed CHAR type which will help indexing a lot and reduce time of SELECTs If your application generates a lot of updates, you'll prefer InnoDB storage engine which allows to lock only one row when updating (vs all the table on myISAM). On the other hand, its more CPU intensive, so if you don't use transactions and that your update pattern is relatively small, consider using myISAM

MatthieuP
+1  A: 

Whatever solution you use, since you say your database will be write-heavy you need to make sure the whole table doesn't get locked on writes. This rules out MyISAM, which some have suggested. MyISAM will lock the table on an update,delete or insert. That means any client who wants to read from the table will have to wait for the write to finish. Dunno what the INSERT LOW PRIORITY does though, probably some hack around table-locking :-)

If you simply must use MySQL, you'll want InnoDB, which doesn't lock on write. I dunno how MySQL does VACUUM's InnoDB tables (InnoDB is MVCC like PostgreSQL and so needs to clean up)... but you'll have to take that into consideration if you are doing a lot of updates or deletes.

Cory R. King
A: 

You are much better off if the "shortish string" is in a fixed length column so that the table has fixed length rows. MySQL with MyISAM will operate quite efficiently for you then. Allocate as much memory as you can for the Key Buffer so that much of the index in memory. Your goal should be a single random access to the disk to retrieve one row -- you can't do better than that given 100GB of data and 8GB of memory. You should not expect to achieve more than a few hundred such queries per second, because that's all the random accesses a disk can do.

You might be interested in my MySQL custom storage engine (described here). It manages memory differently from MyISAM, although the profile of your application isn't exactly what my engine was optimized for.

Dave Pullin