views:

62

answers:

3

I have the next tables

Users {id, name}
Messages {id, user_id, cache_user_name}

What I want is to do a JOIN only when cache_user_name is NULL for performance reasons.

For example:

SELECT Messages.*, Users.name FROM Messages INNER JOIN Users ON (Messages.user_id = Users.id) 
// ON (ISNULL(Messages.cache_user_name) AND ...

The best way is doing 2 queries? 1 for rows without cache (join) and the other for cached rows with a join?

[EDIT]

The result I need is:

Users

ID: 1, NAME: Wiliam

Messages

ID: 1, USER_ID: 1, CACHE_USER_NAME: Wiliam
ID: 2, USER_ID: 1, CACHE_USER_NAME: null

Result

ID: 1,  USER_ID: 1,  CACHE_USER_NAME: Wiliam,  USERS.NAME: null   // No join, faster
ID: 2,  USER_ID: 1,  CACHE_USER_NAME: null,    USERS.NAME: Wiliam // Join
+1  A: 

You can add WHERE ... IS NULLclause.

The optimizer will (try to) use the best performing plan.

SELECT   Messages.*
         , Users.name 
FROM     Messages 
         INNER JOIN Users ON (Messages.user_id = User.id)
WHERE    Users.cache_user_name IS NULL

Edit

Given following data, what would you expect as output?

DECLARE @Users TABLE (ID INTEGER, Name VARCHAR(32))
DECLARE @Messages TABLE (ID INTEGER, User_ID INTEGER, Cache_User_Name VARCHAR(32))

INSERT INTO @Users VALUES (1, 'Wiliam')
INSERT INTO @Users VALUES (2, 'Lieven')
INSERT INTO @Users VALUES (3, 'Alexander')

INSERT INTO @Messages VALUES (1, 1, NULL)
INSERT INTO @Messages VALUES (2, 1, 'Cached Wiliam')
INSERT INTO @Messages VALUES (3, 2, NULL)
INSERT INTO @Messages VALUES (4, 3, 'Cached Alexander')

SELECT  *
FROM    @Users u
        INNER JOIN @Messages m ON m.User_ID = u.ID
WHERE   m.Cache_User_name IS NULL        
Lieven
Only returns the users with CACHE_USER_NAME = NULL
Wiliam
You forgot FROM Messages, Users
Alexander.Plutov
Nope, i didn't forgot that, that only returns the rows without cache
Wiliam
@Wiliam, what then should be returned?
Lieven
What I need is to perform a JOIN in the row only when the Message dont have cached the UserName value. For performance reasons.
Wiliam
I edited my post so you can see what I'm expecting, the thing is that JOIN can slow the query so I only want to join the username from the original table (Users) when the message doesn't have cached the username in a field inside Messages.
Wiliam
@William, <slap forehead> I got it but that brings me to a new question: if both tables are properly indexed, joining them shouldn't pose a performance problem for anything but google size databases. How many records are we talking about?
Lieven
A: 
SELECT m.Id, m.user_id, CACHE_USER_NAME user_name
FROM messages m
WHERE CACHE_USER_NAME IS NOT NULL
UNION ALL
SELECT m.Id, m.user_id, u.user_name user_name
FROM (Select * from messages Where cache_user_name IS NULL) m
JOIN users ON (u.user_id = m.user_id)

Anyway best approach store cache_user_name in table message during creating message. Then you will need join at all.

Michael Pakhantsov
Only returns where cache_user_name = null
Wiliam
@Wiliam, I have update query
Michael Pakhantsov
That's making 2 queries right? That's the last option I have, thanks. Ah, and doing a EXPLAIN you can see that the query is very slow :(
Wiliam
@Wiliam, of course it will slow (two queries with conditions), for avoid join you need always populate cache_user_name when you add/create message. All other approaches will be slower.
Michael Pakhantsov
@Michael Pakhantsov: I'm going to write a UPDATE query that populates it only when NULL. I'm thinking too in expiring the cache values. Thanks
Wiliam
A: 

I think those joins in previous answers with a Not Null where clause should work fine, but maybe we're not following your in-efficiencies problem. As long as users.id and messages.user_id are indexed and of the same type, that join shouldn't be slow unless you have a huge user database and lots of messages. Throw more hardware at it if it is; likely you are running a lot of traffic and can afford it. :)

Alternatively, you could handle it like this: do a query on Messages where the name isn't null, run through the results, find the names for each message (and put them in an array), then query the User's table for just those names. Then as you loop over the Messages results you can display the proper name from the array you saved. You'll have two queries, but they'll be fast.

$users = $messages = $users_ids = array ();

$r = mysql_query('select * from Messages where cache_user_name is not null');
while ( $rs = mysql_fetch_array($r, MYSQL_ASSOC) )
{
    $user_ids[]    = $rs['user_id'];
    $messages[] = $rs;
}

$user_ids = implode ( ',', $user_ids );
$u = mysql_query("select * from Users where id in ($users)");

while ( $rs = mysql_fetch_array($r, MYSQL_ASSOC) )
{
    $users[$rs['id']] = $rs['name'];
}


foreach ( $messages as $message )
{
    echo "message {$message['id']} authored by " . $users[$message['user_id']] . "<br />\n";
}
Hans
What I thought is doing a UPDATE query first to set the missing cache names and then a SELECT with all rows.
Wiliam
how does messages.cache_user_name differ from user.name?
Hans