tags:

views:

138

answers:

3

I'm currently running some SQL which uses an IN expression to match against multiple IDs. However, I would ideally like to be able to match up certain IDs with others so they must appear together to return a result. Here's an example:

Edit: The IDs I'm matching are a part of a many-to-many relationship. The structure is like this:

Articles ArticleKeywords Keywords

An article can have multiple Keywords linked to it by the ArticleKeywords table. I am currently using the IN expression to match any of the Keyword IDs against the records. However, I'd like to match certain records against small groups of keywords i.e. a keyword must appear with another keyword for a record to be matched.

Current: ... AND id IN ('25', '566', '156', '166', '7345')

More specific: ... AND ((id = '25' AND id = '566') OR (id = '156' AND id = '166') OR (id = '7345'))

Although the second option might work, I'm thinking it probably won't be very performance savvy. Is there another way this can be done, or should I be going about it another way?

Thanks for your help.

+2  A: 

Well your second option will never work...

((id = '25' AND id = '566') --For this to return the column `id` would have to = both 25 & 566 which it obviously can't
OR 
(id = '156' AND id = '166') --For this to return the column `id` would have to = both 156 & 166 which it obviously can't
OR 
(id = '7345'))

What exactly are you trying to achieve... What do you mean by "match up certain IDs with others so they must appear together" Do you mean on consecutive rows ?

Eoin Campbell
I think he refers to "id1" and "id2", that means different properties of the same record.
Tomalak
At no point does mention having 2 ID columns... based on his current example with the ID IN (...), it looks like he has a single ID column, and in the latter example, he just has a complete lack of understanding about conditional statements in SQL Where Clauses... :(
Eoin Campbell
Sorry, I should have explained a little better.The IDs I'm matching are a part of a many-to-many relationship. The structure is like this:ArticlesArticleKeywordsKeywordsAn article can have multiple Keywords linked to it by the ArticleKeywords table. I am currently using the IN expression to match any of the Keyword IDs against the records. However, I'd like to match certain records against small groups of keywords i.e. a keyword must appear with another keyword for a record to be matched.
MSR
+3  A: 

Edited based on comments. Say you're searching for articles that:

  • Have a keyword named 'a1'
  • Or, have both keywoards named 'b1' and 'b2'
  • Or, have either keyword 'c1', 'c2' or 'c3'

You can query like:

select a.id
from Articles a 
inner join ArticleKeywords ak on ak.articleid = a.id
inner join Keywords k on k.id = ak.keywordid
group by a.id
having 
    sum(case when k.name in ('a1') then 1 else 0 end) = 1
    or sum(case when k.name in ('b1','b2') then 1 else 0 end) = 2 
    or sum(case when k.name in ('c1','c2,'c3') then 1 else 0 end) > 0

Per SquareCog's comment, you can greatly increase performance with an early WHERE clause. The clause would limit the grouping to the relevant keywords only. In the above query, add the WHERE just before the HAVING:

...
inner join Keywords k on k.id = ak.keywordid
where k.name in ('a1','b1','b2','c1','c2','c3')
group by a.id
...

You can retrieve the other details of the article(s) like:

select *
from Articles
where id in (
    ...query from above here...
)

Say you have a table that contains groups to search for, defined like:

groupid - keywordid
1 - 1
1 - 2
2 - 3

Meaning the article has to match ((keyword 1 and keyword2) or keyword3). Then you can query like this:

select ak.articleid
from ArticleKeywords ak
inner join Search s on ak.keywordid = s.keywordid
group by s.searchgroup, ak.articleid
having count(*) = (
    select count(*) from #Search s2 where s2.Searchgroup = s.SearchGroup
)
Andomar
This is where it gets complicated, because the number of keywords is unknown and not limited. Maybe I'm expecting too much!
MSR
Thanks. The problem with that is there might be sets of words with 1, 2, 3 or more words that must appear together, so the having part is going to restrict those possibilities.
MSR
Nicely done, Andomar. For efficiency, one would still want to throw in a 'where k.keyword in....' so that you can filter out the irrelevant section before doing the group-by or the join.
SquareCog
Extending the SearchGroup idea: You would be able to define search groups on the fly (e.g. by user input), effectively enabling you to express your "disjunction of conjunctions" style query by including a "WHERE s.SearchGroup IN (1, 2, 3)"
Tomalak
+1  A: 

The answer given by Andomar using GROUP BY and HAVING is a common way to solve this kind of problem, but it doesn't perform very well. The GROUP BY often causes a temporary table.

Back to your example:

...AND ((id = '25' AND id = '566') OR (id = '156' AND id = '166') OR (id = '7345'))

This can never be true. The conditions in a WHERE clause apply to one row at a time. The id column can never have two values on a given row. I understand conceptually what you are intending to test for, but this isn't the way SQL works.

When you need to write a condition that involves values occurring on multiple rows, another solution is to use a self-join.

SELECT *
FROM ArticleKeywords k1
 LEFT OUTER JOIN ArticleKeywords k2 ON (k1.article_id = k2.article_id)
WHERE k1.keyword_id = '7345'
 OR (k1.keyword_id = '25' AND k2.keyword_id = '566')
 OR (k1.keyword_id = '156' AND k2.keyword_id = '166');

Most people would call k1 and k2 "table aliases." But if you think of these aliases as pointing to individual rows in the table, instead, then it becomes clearer how to write conditions using self-joins.

Of course if you need to test for a small group of three values instead of two values, you need to do another self-join.

This solution may not be able to use indexes efficiently, but it won't incur the temporary table that the GROUP BY solution does. Try both solutions, use EXPLAIN to analyze them, and then measure their performance to compare.

Bill Karwin
Bill, this gets cumbersome when he doesn't know the number of keywords (and hence the number of joins). You are right about the group-by cost, which is why I suggested adding a where clause filter with Andomar's solution. Assuming a decent optimizer, the where should get pushed down, returning very few keywords, and the join and group by can be performed cheaply.
SquareCog
@SquareCog: Fair enough. Just offering this as an alternative method to keep in your toolbox. :)
Bill Karwin
@Bill Karwin: I was thinking about a solution that involves a "Search" table (SearchId INT, KeywordId INT). Each search condition group would be entered there, e.g. ((1, 25), (1,566)). This could be joined to ArticleKeywords to find the ArticleIds that satisfy search #1. However, I'm struggling with writing a JOIN that does that - is this at all possible?
Tomalak
@Tomolak: I had actually done that searching for an answer. I'll add the query.
Andomar
@Andomar: Actually that's exactly what I had in mind. It seems I was thinking way too complicated, your solution should have been obvious to me. ;-)
Tomalak