views:

69

answers:

1

I'm designing a set of web apps to track scientific laboratory data. Each laboratory has several members, each of whom will access both their own data and that of their laboratory as a whole. Many typical queries will thus be expected to return records of multiple members (e.g. my mouse, joe's mouse and sally's mouse).

I think I have the database fairly well normalized. I'm now wondering how to ensure that users can efficiently access both their own data and their lab's data set when it is mixed among (hopefully) a whole ton of records from other labs.

What I've come up with so far is that most tables will end with two fields: user_id and labgroup_id. The WHERE clause of any SELECT statement will include the appropriate reference to one of the id fields ("...WHERE 'labroup_id=n..." or "...WHERE user_id=n...").

My questions are:

  1. Is this an approach that will scale to 10^6 or more records?

  2. If so, what's the best way to use these fields in a query so that it most efficiently searches the relevant subset of the database? e.g. Should the first step in querying be to create a temporary table containing just the labgroup's data? Or will indexing using some combination of the id, user_id, and labroup_id fields be sufficient at that scale?

I thank any responders very much in advance.

+2  A: 

You should be more than fine using this approach with 10^6 rows. We currently use something very similar with mixed customer data differentiated by an account ID with 10^8 rows and have no performance issues at all on modest hardware.

Ensure that you have indices defined that cover user_id and labgroup_id.

Bear in mind that MySQL can only use one key per query. Look at your typical query pattern. If people will use several columns in where clauses, build compound keys that include heavily used columns that also provide good differentiation (meaning help narrow down the rows... a yes/no column is a poor key but a column with many distinct values that is frequently used in the where clause may be a good candidate).

Enable the MySQL slow query log (or get the commercial Query Analyzer or it's 30 day trial) and see which queries are taking a long time. Use the EXPLAIN command to figure out what index is being used and how. If a particular query shows up in the slow query log frequently and/or with very long execution times, consider modifying your indices or adding a new one.

Ensure that you have your my.cnf properly tuned for your environment. The out-of-the-box configuration is almost always very poor. Here's a good guide to that.

Eric J.
Awesome--many thanks indeed!
Rob Campbell
@Rob: One more thing... MySQL can use parts of a compound index from left to right, so e.g. if people will usually query with labgroup_id but sometimes also query with another column such as (making one up here) experiment_id, you should create a compound index labgroup_id+experiment_id. When they just have labgroup_id in the WHERE clause, MySQL can use the compound index. On the other hand, MySQL could NOT use that index if the where clause only used experiment_id without the labgroup_id (it can read compound keys left to right).
Eric J.