tags:

views:

212

answers:

8

Imagine I have the following data in a table called "messages":

message_id | language_id | message
------------------------------------
1            en            Hello
1            de            Hallo
1            es            Hola
2            en            Goodbye
2            es            Adios

(Note that I don't have a German translation for "Goodbye.")

I want to select the messages for a user who speaks English and German, but prefers German.

Meaning, I want a result set that looks like:

message_id | language_id | message
------------------------------------
1            de            Hallo
2            en            Goodbye

But, um, it's proving tricky. Any ideas?

A: 

Here's one possible solution:

First I just set up your tables:

DROP TEMPORARY TABLE IF EXISTS messages;
CREATE TEMPORARY TABLE messages (
  message_id INT,
  language_id INT,
  message VARCHAR(64)
);

INSERT INTO messages VALUES
(1, 1, "Hello"),
(1, 2, "Hellode"),
(1, 3, "Hola"),
(2, 1, "Goodbye"),
(2, 3, "Adios");

And adds a new for language preference:

DROP TEMPORARY TABLE IF EXISTS user_language_preference;
CREATE TEMPORARY TABLE user_language_preference (
  user_id INT,
  language_id INT,
  preference INT
);

INSERT INTO user_language_preference VALUES
(1, 1, 10), # know english
(1, 2, 100); # but prefers 'de'

And the queries..

Hello:

SET @user_id = 1;
SET @message_id = 1;

# Returns 'Hellode', 'Hello'
SELECT
  m.language_id,
  message
FROM messages AS m, user_language_preference AS l
WHERE message_id=@message_id
  AND m.language_id=l.language_id
  AND user_id=@user_id
ORDER BY preference DESC;

Goodbye:

SET @message_id = 2;

# Returns 'Goodbye' as 'de' doesn't have a message there
SELECT
  m.language_id,
  message
FROM messages AS m, user_language_preference AS l
WHERE message_id=@message_id
  AND m.language_id=l.language_id
  AND user_id=@user_id
ORDER BY preference DESC;

Edit: In response to the comment:

SELECT
  m.message_id,
  m.language_id,
  message
FROM messages AS m, user_language_preference AS l
WHERE m.language_id=l.language_id
  AND user_id=@user_id
ORDER BY m.message_id, preference DESC;
simendsjo
Thanks. Any way to get a result set like the one I mentioned above? I might actually be selecting 100 messages at a time, so I'd rather get them all in one select statement instead of 100.
Sean
See my edit. You can also use message_id IN (1,2,3,4,....100) if you need just specific messages
simendsjo
Thanks, but it looks like messages will appear in those results more than once? (Message #1 still shows up in the results twice for user #1.) I'm hoping to get each message only once, in the language that the user would most prefer.
Sean
Here a simple workaround is to wrap the whole into a `SELECT * FROM (...) t GROUP BY message_id`, though I guess that could be quite slow if you are selecting a lot of messages at once…
streetpc
+1  A: 
select message_id, language_id, message
from
(select if(language_id="de",0,1) as choice, m.*
 from messages m where m.language_id in ("de","en")
 order by choice) z
group by message_id

Set up your preferences via the "if" in the select to force preferred language to the top of the result set, so group by will select it.

You could also do this, but the response above is probably neater for what you want to use it for.

select *
from messages m where m.language_id = "de" or
 (language_id = "en" and not exists (select 1 from messages n
                                  where n.language_id = "de" and
                                    n.message_id = m.message_id))

Further to your comments. If you are uncomfortable with using the particular MySQL behaviour of GROUP BY (without aggregate functions), then you could use this more standard code:

select *
 from messages m where m.language_id in ("de","en")
  and if(m.language_id="de",0,1) <= (select min(if(n.language_id="de",0,1))
 from messages n where n.message_id = m.message_id)
xagyg
In the first one, you're using a "group by" clause but no aggregate function on the "message" column. Doesn't that mean you can't rely on the value mysql will give you in the results?In the second one, it looks kind of unwieldy if you want to add a third or fourth preferred language.
Sean
"Doesn't that mean you can't rely on the valuemysql will give you in the results?"Theoretically maybe, but practically the ORDER BY sorts the preferred to the top of the group.
xagyg
I have edited my original answer with a third code block to provide a solution that does not rely on MySQL's particular handling of non-aggregate columns in a GROUP BY.
xagyg
A: 

Use the group-concat trick to get this in one single query:

select message_id,
       substring(max(concat(if(language_id='de', 9, if(language_id='en',8,0)), message)),2) as message,
       substring(max(concat(if(language_id='de', 9, if(language_id='en',8,0)), language_id)),2) as language
from messages 
group by message_id;

just add conditions and appropriate priorities in the IF clauses to add more fallback languages.

ggiroux
Seems to work, but that MySQL describes it as a "quite inefficient trick" is a little disconcerting.
Sean
benchmark the thing in your environment, but I've successfully used that trick on tables containing millions of rows (well selecting a very small subset but still I haven't seen a huge slowdown). Some of the other solutions involving sub-queries would also work, but MySQL is even more notably bad at subqueries.
ggiroux
A: 
SELECT *
FROM messages
WHERE (message_id,CASE language_id WHEN 'de' THEN 1 WHEN 'en' THEN 2 ELSE NULL END) IN (
    SELECT message_id, MIN(CASE language_id WHEN 'de' THEN 1 WHEN 'en' THEN 2 ELSE NULL END) pref_language_id
    FROM `messages`
    GROUP BY message_id
)

You have to change the CASE language_id WHEN 'de' THEN 1 WHEN 'en' THEN 2 ELSE NULL END to the users preferred language(s). If he has a third one, just add another case, eg. CASE language_id WHEN 'de' THEN 1 WHEN 'en' THEN 2 WHEN 'es' THEN 3 ELSE NULL END.

andrem
I haven't have a chance to plug this into my code and see how it runs yet, but with just a quick glance, would it have any negative side effects using the FIELD() function instead of a CASE statement?
Sean
You could use the FIELD function too, didn't know about that one.
andrem
Well you have to turn the order around, from least preferred to best preferred and use MAX() instead of MIN(). And use NULLIF(FIELD(...),0), to handle languages the user does not have.
andrem
A: 

This is a good example for group-wise maximum query. http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

Here is what I came up with. Using same data and schema as simendsjo.

SELECT prefered.message_id, p2.language_id, message FROM
  (SELECT message_id, MAX(preference) AS prefered FROM messages m
  JOIN user_language_preference p ON p.language_id = m.language_id AND p.user_id = 1
  GROUP BY m.message_id) AS prefered
  JOIN user_language_preference p2 ON prefered = p2.preference AND p2.user_id = 1
  JOIN messages m2 ON p2.language_id = m2.language_id AND m2.message_id = prefered.message_id

Here it how it works.

  1. Inner query prefered selects all messages, joins them to user language preferences, calculating maximum preference for each message (GROUP BY m.messsage id). If there is now translation, maximum will be for the next prefered language, and so on...
  2. Outer query consists of two joins: The first join gets language id from maximum preference (MAX(preference) = prefered = p2.preference) for given user.
  3. Last join m2 just selects translation for known prefered language_id and message_id.

PS. Don't forget to change both occurrences of user_id.

johno
A: 

Edited to add some alternative solutions corresponding to the nature of the question. :D
(FWIW: Second Choice was my first implementation)

First Choice

This one should be able to provide the best performance, albeit a little trickier to follow.
More importantly though, it scales better to include 4th, 5th, 6th etc. languages.
The solution requires a temporary table defining the Priority of the languages (use whatever technique is best in mysql).
The meat of the solution is in the 'finder' subquery; once it has determined the best priority language available, it is a simple matter to join back to get the actual messages.

declare @prio table (prio_id int, lid varchar(5))
insert into @prio values(1, 'de')
insert into @prio values(2, 'en')
insert into @prio values(3, 'es')

select  m.*
from    (
        select  message_id, MIN(prio_id) prio_id
        from    @messages m
                inner join @Prio p on
                  p.lid = m.language_id
        group by message_id
        ) finder
        inner join @Prio p
          on p.prio_id = finder.prio_id
        inner join @messages m
          on m.message_id = finder.message_id
         and m.language_id = p.lid

Second Choice

The following query structure should be easy enough to follow.
Each union adds to the result set any message id's not already in the result set.
UNION ALL suffices because each subsequent query guarantees no duplication.
An index on (language_id, message_id) should offer best performance (especially if it's clustered).

select  message_id, language_id, message
from    messages
where   language_id = 'de'
union all
select  message_id, language_id, message
from    messages
where   language_id = 'en' 
    and message_id not in (select message_id from messages where language_id in ('de'))
union all
select  message_id, language_id, message
from    messages
where   language_id = 'es' 
    and message_id not in (select message_id from messages where language_id in ('de', 'en'))

Third Choice

This is an intersting one using the COALESCE function.
However, I don't expect that it will perform that well on large volumes of data.

select  *,
        COALESCE(
        (select language_id from @messages where message_id = m.message_id and language_id = 'de'),
        (select language_id from @messages where message_id = m.message_id and language_id = 'en'),
        (select language_id from @messages where message_id = m.message_id and language_id = 'es')
        ) language_id,
        COALESCE(
        (select message from @messages where message_id = m.message_id and language_id = 'de'),
        (select message from @messages where message_id = m.message_id and language_id = 'en'),
        (select message from @messages where message_id = m.message_id and language_id = 'es')
        ) message
from    (
        select  distinct message_id
        from    @messages
        ) m
Craig Young
+1  A: 

This query will do exactly what you need:

SELECT * FROM (
    SELECT * FROM messages
    WHERE language_id IN ('en', 'de')
    ORDER BY FIELD(language_id, 'en', 'de') DESC
) m
GROUP BY message_id;

Languages in FIELD(language_id, 'en', 'de') should be ordered by priority: the latest one ("de" in this case) will have higher priority, then "en", then all others.

WHERE clause is optional here and is required only if you don't want any results in cases when there is no translation for neither "en" nor "de".

Edit: Sean mentioned the fact that a GROUP BY clause on non-aggregate columns may produce unreliable results. This may be true, at least MySQL Manual says so (though in practice, the first matching row is always (?) used).

Anyway, there is another query with the same idea, but without the mentioned problem.

SELECT m1.*
FROM messages AS m1
INNER JOIN (
    SELECT message_id, MAX(FIELD(language_id, 'en', 'de')) AS weight
    FROM messages
    WHERE language_id IN ('en', 'de')
    GROUP BY message_id
) AS m2
USING(message_id)
WHERE FIELD(m1.language_id, 'en', 'de') = m2.weight;
Alexander Konstantinov
Is that using a GROUP BY clause on non-aggregate columns? MySQL warns that that produces unreliable results.
Sean
@Sean, you're right about unreliable results, I've update my answer with another query, though it is more verbose than the first one.
Alexander Konstantinov
A: 

The fastest solution I've found which gives the resultset I'm after is described in this article:

http://onlamp.com/pub/a/mysql/2007/03/29/emulating-analytic-aka-ranking-functions-with-mysql.html

Sean