views:

168

answers:

3

This is my first time building a database with a table containing 10 million records. The table is a members table that will contain all the details of a member.

What do I need to pay attention when I build the database?

Do I need a special version of MySQL? Should I use MyISAM or InnoDB?

+4  A: 

For a start, you may need to step back and re-examine your schema. How did you end up with 10 million rows in the member table? Do you actually have 10 million members (it seems like a lot)?

I suspect (although I'm not sure) that you have less than 10 million members in which case your table will not be correctly structured. Please post the schema, that's the first step to us helping you out.

If you do have 10 million members, my advice is to make your application vendor-agnostic to begin with (i.e., standard SQL). Then, if you start running into problems, just toss out your current DBMS and replace it with a more powerful one.

Once you've established you have one that's suitable, then, and only then would I advise using vendor-specific stuff. Otherwise it will be a painful process to change.

BTW, 10 million rows is not really considered a big database table, at least not where I come from.

Beyond that, the following is important (not necessarily an exhaustive list but a good start).

  • Design your tables for 3NF always. Once you identify performance problems, you can violate that rule provided you understand the consequences.
  • Don't bother performance tuning during development, your queries are in a state of flux. Just accept the fact they may not run fast.
  • Once the majority of queries are locked down, then start tuning your tables. Add whatever indexes speed up the selects, de-normalize and so forth.
  • Tuning is not a set-and-forget operation (which is why we pay our DBAs so much). Continuously monitor performance and tune to suit.
  • I prefer to keep my SQL standard to retain the ability to switch vendors at any time. But I'm pragmatic. Use vendor-specific stuff if it really gives you a boost. Just be aware of what you're losing and try to isolate the vendor-specific stuff as much as possible.
  • People that use "select * from ..." when they don't need every column should be beaten into submission.
  • Likewise those that select every row to filter out on the client side. The people that write our DBMS' aren't sitting around all day playing Solitaire, they know how to make queries run fast. Let the database do what it's best at. Filtering and aggregation is best done on the server side - only send what is needed across the wire.
  • Generate your queries to be useful. Other than the DoD who require reports detailing every component of their aircraft carriers down to the nuts-and-bolts level, no-one's interested in reading your 1200-page report no matter how useful you think it may be. In fact, I don't think the DoD reads theirs either, but I wouldn't want some general chewing me out because I didn't deliver - those guys can be loud and they have a fair bit of sophisticated weaponry under their control.
paxdiablo
i have 10 million rows of different users
Haim Evgi
Wish my website had 10 million users....
Luke Lowrey
They are actually residents of state
Haim Evgi
A: 

At least use InnoDB. You will feel the pain when you realize MyISAM has just lost your data...

Apart from this, you should give more information about what you want to do.

peufeu
A: 

You don't need to use InnoDB if you don't have data integrity and atomic action requirements. You want to use InnoDB if you have foreign keys between tables and you are required to keep the constraints, or if you need to update multiple tables in atomic operation. Otherwise, if you just need to use the table to do analysis, MyISAM is fine.

For queries, make sure you build smart indexes to suite your query. For example, if you want to sort by columns c and selecting based on columns a, and b, make sure you have an index that covers columns a, b, and c, in that order, and that index includes full length of each column, rather than a prefix. If you don't do your index right, sorting over a large amount of data will kill you. See http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html

Just a note about InnoDB and setting up and testing a large table with it. If you start injecting your data, it will take hours. Make sure you issue commits periodically, otherwise if you want to stop and redo for whatever reason, you end up have to 1) wait hours for transaction recovery, or 2) kill mysqld, set InnoDB recover flag to no recover and restart. Also if you want to re-inject data from scratch, DROP the table and recreate it is almost instantaneous, but it will take hours to actually "DELETE FROM table".

OverClocked
"don't have ... intergrity and [atomicity]"? They are two of the four ACID properties of a DBMS. Without them, you may as well just store your stuff in a flat text file :-)
paxdiablo
Pax, MyISAM table is not ACID, but it's definitely still useful for just analyzing results.
OverClocked