tags:

views:

51

answers:

3

I have a user table, and then a number of dependent tables with a one to many relationship e.g. an email table, an address table and a groups table. (i.e. one user can have multiple email addresses, physical addresses and can be a member of many groups)

Is it better to:

  1. Join all these tables, and process the heap of data in code,

  2. Use something like GROUP_CONCAT and return one row, and split apart the fields in code,

  3. Or query each table independently?

Thanks.

A: 

Personally, assuming my table indexes where up to scratch I'd going with a table join and get all the data out in one go and then process that to end up with a nested data structure. This way you're playing to each systems strengths.

Scimon
Remember though, that approach exponentially increases the number of records (has the effect of denormalising everything). On a large data set your application ends up using a lot of memory which is usually undesirable. If the dataset is small enough to deal with that then you might as well use option 3 which is faster to code for.
Cfreak
+3  A: 

It really depends on how much data you have in the related tables and on how many users you're querying at a time.

Option 1 tends to be messy to deal with in code.

Option 2 tends to be messy to deal with as well in addition to the fact that grouping tends to be slow especially on large datasets.

Option 3 is easiest to deal with but generates more queries overall. If your data-set is small and you're not planning to scale much beyond your current needs its probably the best option. It's definitely the best option if you're only trying to display one record.

There is a fourth option however that is a middle of the road approach which I use in my job in which we deal with a very similar situation. Instead of getting the related records for each row 1 at a time, use IN() to get all of the related records for your results set. Then loop in your code to match them to the appropriate record for display. If you cache search queries you can cache that second query as well. Its only two queries and only one loop in the code (no parsing, use hashes to relate things by their key)

Cfreak
After partially implementing option 1, I've changed my mind and gone for option 3. With option one I was pulling out maybe 20 rows per person, but I could see that as the system expands and requirements change, this could easily spiral up into the hundreds or even thousands, like you pointed out below. It just doesn't scale. I like your IN idea, thanks.
aidan
A: 

Generally speaking, do the most efficient query for the situation you're in. So don't create a mega query that you use in all cases. Create case specific queries that return just the information you need.

In terms of processing the results, if you use GROUP_CONCAT you have to split all the resulting values during processing. If there are extra delimiter characters in your GROUP_CONCAT'd values, this can be problematic. My preferred method is to put the GROUPed BY field into a $holder during the output loop. Compare that field to the $holder each time through and change your output accordingly.

dnagirl