ansaurus

Question

How do I avoid redundant data fields in the result set when using JOINs ?

Answer 1

+1 A:

In the straight sql query there is not if you keep them as a single query. If you are programmatically printing this out, then you would order by user id and only reprint that information if the user id changes.

REW 2010-07-05 04:48:57

Answer 2

+1 A:

In the SQL standard, you would use NATURAL JOIN; this joins on common column names and only preserves one copy of those common names.

In practice, you carefully list the columns you want, rather than resorting to the '*' shorthand notation.

Jonathan Leffler 2010-07-05 04:58:41

Answer 3

+1 A:

There are several things you should know.

The first is that the default SQL JOIN construct is essentially a set cross product, restricted by the WHERE clause. This means it's multiplicative - you get duplicate results out which you then prune down. You also have to be careful in the presence of NULL fields.

The second is that there is a 'DISTINCT' keyword. When you prefix a column in the selection with this, you'll get at most one instance of a certain value for that column in the results. So, as per your query, 'SELECT DISTINCT user.id FROM' will eliminate the redundancies on the server side.

The third is that the correct way to solve this problem is likely not using the '*' operator. I suggest:

SELECT user.id,username,email,subject FROM message m,user WHERE m.user_id=user.id AND user.status=1

This uses the simple, easy-to-understand implicit-join syntax and should be valid SQL on whatever server. I can vouch for it working with MySQL, at least. It also aliases the 'message' table to 'm' as shorthand.

As you surmise, this will reduce the traffic from the SQL server to your database.

edit: if you want to eliminate the "redundant" email information, you can't - you must make two distinct queries. SQL results are tables and must be rectangular, with all known values filled. There's no 'ditto' entry.

edit 2: You only have to make two queries. For instance:

SELECT subject FROM message WHERE message.id IN (SELECT user.id FROM user WHERE status=1)

This is one query that contains a nested query, so it's really making two database hits. But it doesn't have any programmatic loops.

Borealid 2010-07-05 05:34:09

Answer 4

A:

Assuming you can use stored procedure, you could write one to run the above query and then use a cursor to store nulls for 'redundant information' to get something like

user.id  username email            message.id  subject 
1        jane     [email protected]   120         Notification  
null     null     null             122         Re:Hello  
null     null     null             125         Quotation 
2        john     [email protected]   127         Hi jane 
null     null     null             128         Fix thiss  
null     null     null             129         Ok 
3        jim      [email protected]      140         Re:Re:Quotation

and then return this resultset in a temporary table. but while this may reduce network traffic, it will add a processing overhead

Another way is to run 2 queries, one to get the user information, and the other to get the message information with just the linked user id and then do the "join" using application server side code. something like

SELECT DISTINCT user.* FROM user, message WHERE message.user_id=user.id AND user.status=1

and

SELECT user.id, message.* FROM user, message WHERE message.user_id=user.id AND user.status=1

which will result in 2 trips to the database, instead of 1, which might eventually be slower, even if the network traffic is reduced.

And another way is to bunch these 2 into a single resultset with something like

SELECT user.* FROM user, message WHERE message.user_id=user.id AND user.status=1
UNION ALL
SELECT user.id, message.* FROM user, message WHERE message.user_id=user.id AND user.status=1

to get something like

   user.id  username/message.id    email/subject 
    1        jane                   [email protected]   
    2        john                   [email protected]   
    3        jim                    [email protected]      
    1        120                    Notification           
    1        122                    Re:Hello           
    1        125                    Quotation          
    2        127                    Hi jane          
    2        128                    Fix thiss           
    2        129                    Ok          
    3        140                    Re:Re:Quotation

and then use application server logic to separate it out. reduced network traffic but more application server load / marginally more database server load.

But the saved network traffic is rarely worth the added complexity.

potatopeelings 2010-07-05 05:59:47

ansaurus

tags:

views:

answers:

How do I avoid redundant data fields in the result set when using JOINs ?

related questions