views:

55

answers:

2

Hi, building a site using PHP and MySQL that needs to store a lot of properties about users (for example their DOB, height, weight etc) which is fairly simple (single table, lots of properties (almost all are required)).

However, the system also needs to store other information, such as their spoken languages, instrumental abilities, etc. All in all their are over a dozen such characteristics. By default I assumed creating a separate table (called maybe languages) and then a link table with a composite id (user_id, language_id).

The problem I foresee though is when visitors attempt to search for users using these criteria. The dataset we're looking to use will have over 15,000 users at time of launch and the primary function will be searching and refining users. That means hundreds of queries daily and the prospect of using queries with up a dozen or more JOINs in them is not appealing.

So my question is, is there an alternative that's going to be more efficient? One way I was thinking is storing the M2M values as a CSV of IDs in the user table and then running a LIKE query against it. I know LIKE isn't the best, but is it better than a join?

Any possible solutions will be much appreciated.

+1  A: 

Do it with joins. Then, if your performance goals are not met, try something else.

erikkallen
That's the thing, running it on a sample install on a faster machine with just 1,500 records it's already taking 400+ms, even at a linear rate that's just going to be too slow.
Dachande663
I think there's something very wrong with your setup there: 1500 rows is tiny; even the worst case of doing a full table search without indexes shouldn't be taking 400ms. What table types are you using, do you have indexes, what's your exact query?
bobince
@bobince: I think he says 15,000, but I agree fully, even with 15k it's still tiny and the problem has to be elsewhere.
erikkallen
A: 

Start with a normalized database (e.g. a languages table, linked to the users table by a mapping table) to make sure you data is represented cleanly and logically.

If you have performance problems, examine your queries and make sure you have suitable indexes.

If you dislike repeatedly coding up queries with many joins, define some views.

If views are very slow to query, consider materialized views.

If you have several thousand records and a few hundred queries per day (really, that's pretty small and low-usage), these techniques will allow your site to run at full speed, with no compromise on data integrity. If you need to scale to many millions of records and millions of queries per day, even these techniques may not be enough; in which case, investigate cacheing and denormalization.

vincebowdren