tags:

views:

78

answers:

7

Hi, Suppose I have two tables:

Group
(
    id integer primary key,
    someData1 text,
    someData2 text
)

GroupMember
(
    id integer primary key,
    group_id   foreign key to Group.id,
    someData   text
)

I'm aware that my SQL syntax is not correct :) Hopefully is clear enough. My problem is this: I want to load a group record and all the GroupMember records associated with that group. As I see it, there are two options.

A single query:

SELECT Group.id, Group.someData1, Group.someData2 GroupMember.id, GroupMember.someData
FROM Group INNER JOIN GroupMember ...
WHERE Group.id = 4;

Two queries:

SELECT id, someData2, someData2
FROM Group
WHERE id = 4;

SELECT id, someData
FROM GroupMember
WHERE group_id = 4;

The first solution has the advantage of only being one database round trip, but has the disadvantage of returning redundant data (All group data is duplicated for every group member)

The second solution returns no duplicate data but involves two round trips to the database.

What is preferable here? I suppose there's some threshold such that if the group sizes become sufficiently large, the cost of returning all the redundant data is going to be greater than the overhead involved with an additional database call. What other things should I be thinking about here?

Thanks, Jordan

+3  A: 

If you actually want the results joined, I believe it is always more efficient to do the joining at the server level. The SQL processor is designed to match sets of data. If you really want the results of 2 sql statements, you can always send two statements in one batch separated by a semicolon, and get two resultsets back with one round trip to the DB.

Jeremy
+1  A: 

If you are only ever going to be retreiving a single group record with each request to the database then i would go with the second option. If you are retrieving multiple group records and associated group member records, go with the join as it will be much quicker.

Ben Robinson
+2  A: 

How the data is finally used is an important and unknown factor.

I suggest the single query method for most applications. Proper indexing will keep the query more efficient than the two query method.

The single query method also has the benefit of remaining valid if you need to select more than one group.

K Richard
A: 

On a simple query like this I would to try to perform it in one query. The overhead of two database calls will probably exceed the additional SQL processing time from the query.

A UNION clause will do this for you:

SELECT id, someData1, someData2 
FROM Group 
WHERE id = 4
UNION 
SELECT id, someData, null 
FROM GroupMember 
WHERE group_id = 4; 
Lost in Alabama
How would you differentiate the group record from the rest? And why would you think the columns would be compatible?
Jeff O
Question didn't ask to differentiate the group records and based on the information given, the someData fields are text without length specified.
Lost in Alabama
A: 

In general, it depends on what type of data you are trying to display.

If you are showing a single group and all its members, performance differences between the two options would be negligible.

If you are showing many groups and all of their members, the overhead of having to make a roundtrip to the database for each successive group will quickly outweigh any benefit you got from receiving a little less data.

Some other things you might want to consider in you reasoning

  • Result Set Size - For many groups and members, your result set size may become a limiting factor as the size to retrieve and keep it in memory increases. This is likely to occur with the second option. You may want to consider paging the data, so that you are only retrieving a certain subset at a time.

  • Lazy Loading - If you are only getting the members of some groups, or a user is requesting the members one group at a time, consider Lazy Loading. This means only making the additional query to get the group's members when needed. This makes sense only in certain use cases, but it can be much more effective than retrieving all data up front.

tschaible
A: 

Depending on the type of database and your frontend application, you can return the results of two SQL statements on one trip (A stored procedure in SQL Server 2005 for example).

If you are creating a report that requires many fields from the Group table, you may not want the increased amount of data with the first query.

If this is some type of data entry app, you've probably already presented the Group data to the user, so they could fill in the group id on the where clause (or preferably via some parameter) and now they need the member results.

Jeff O
A: 

It really, really, really depends on what use you will make of the data.

For insatnce, if you were assembling a list of group members for a mail shot, and you need the group name for each letter you're going to send to a member, and you have no use for the Group level then the single joined query makes a lot of sense.

But if, say, you're coding a master-detail screen or report, with a page for each group and displaying information at both the Group and the Member levels then the two separate queries is probably most useful.

Unless you are retrieving quite large amounts of data (tens of thousands of groups with hundreds of memebers per group, or similar orders of magnitude) it is unlikely you are going to see much difference between performances of the two approaches.

APC