views:

222

answers:

4

Hi All,

I want a search functionality in my application for the data like following

topic_id   tag
1          cricket
1          football
2          football
2          basketball
3          cricket
3          basketball
4          chess
4          basketball

Now when i search for term cricket AND football o/p should be

 topic_id
    1

and when i search for term cricket OR football o/p should be

 topic_id
    1
    2
    3

i try something like following

FOR AND

  select topic_id from table_name where tag like "%cricket%" and topic_id in (select topic_id from table_name where tag like "%football%")

FOR OR

 select topic_id from table_name where tag like "%cricket%" OR tag like "%football%"

My problem is when user search for the cricket AND football AND basketball AND chess my Query becomes very pathetic

is there any simple solution for this. I also tried for GROUP_CONCAT but in vain

+1  A: 

You need to do a self join

select distinct topic_id from 
table_name as t1
join
table_name as t2 
on 
t1.topic_id = t2.topic_id
and
t1.tag = "cricket"
and
t2.tag = "football"
bradgonesurfing
please answer for the search term like `cricket AND football AND basketball AND chess`
Salil
What is it you do not understand about the above solution. You can just extend it with as many joins as you wish for each tag.
bradgonesurfing
+2  A: 
 SELECT TopicId
 FROM Table
 WHERE Tag IN ('cricket', 'football', 'basketball', 'chess')
 GROUP By TopicId
 HAVING Count(*) = 4

  4 is magic number - its a length of your AND list.

 FOR cricket AND football

 it will be 2:

 SELECT TopicId
 FROM Table
 WHERE Tag IN ('cricket', 'football')
 GROUP By TopicId
 HAVING Count(*) = 2

 if you want use 'like' statement:

 SELECT TopicId
 FROM Table
 WHERE Tag IN (SELECT distinct Tag from Table Where Tag like '...'
                OR Tag like '...'
                OR Tag like '...'
                OR Tag like '...'
              )
 GROUP By TopicId
 HAVING Count(*) = (SELECT COUNT(distinct Tag) from Table 
                    Where Tag like '...'
                       OR Tag like '...' 
                       OR Tag like '...'
                       OR Tag like '...'
                   )

UPDATE:

This task can be easy solved with RDBMS which support all sets operations: UNION, INTERSECT and EXCEPT (or MINUS)

Then any conditions like:

  1. (Tag1 AND Tag2) OR Tag3 NOT Tag4
  2. Tag1 OR Tag2
  3. Tag1 AND Tag2 And Tag3
  4. (Tag1 AND Tag2) OR (Tag3 AND Tag4)

can be easily transformed into:

1. (Select * ... Where Tag = Tag1
    INTERSECT
    Select * ... Where Tag = Tag2
   )
   UNION
   (Select * ... Where Tag = Tag3)
   EXCEPT
   (Select * ... Where Tag = Tag4)

2. Select * ... Where Tag = Tag1
   UNION
   Select * ... Where Tag = Tag2

3. Select * ... Where Tag = Tag1
   INTERSECT
   Select * ... Where Tag = Tag2
   INTERSECT
   Select * ... Where Tag = Tag3

 4.(Select * ... Where Tag = Tag1
    INTERSECT
    Select * ... Where Tag = Tag2
   )
   UNION
   (Select * ... Where Tag = Tag1
    INTERSECT
    Select * ... Where Tag = Tag2
   )

The real problem that MYSQL does not support INTERSECT, which should be emulated as shown above. Second problem is respecting brackets and operator precedences.

So possible solution without using brackets in expressions:

  1. Collect all tags which joined by AND conditions and build query as first example in answer.

  2. Add all tags which joined OR condition (can be used IN or UNION) and by using UNION combine result.

Another approach possible only if you have tag quantity less 64. Then each tag will have own bit (You will need add bigint field 'tags' into topics table where will be represented tags in binary format) and by using mysql bit operations create query.

Big disadvantage that this solution limited only for 64 tags.

Michael Pakhantsov
ohh!!!thanks it will work but in practical i want `like` instead of `matching whole string` ex:- `tag like '%cricket%'`. `AND I think like is not working with in`
Salil
@Salil - Why not? :) Added query for like approach
Michael Pakhantsov
@ Michael Pakhantsov:- don't you think in your example `HAVING Count(*) = will return 8` whereas i want result `0` rows as there is no `topic_id` with all the above tags
Salil
@Salil. Do not understand which part of query will return 8? May you explain?
Michael Pakhantsov
@ Michael Pakhantsov:- Ok forget it just tell me what will be the o/p for the search string `cricket AND football AND basketball AND chess` using that query?
Salil
@Salil look to first query in my answer - where returned topics for search string: "cricket AND football AND basketball AND chess"
Michael Pakhantsov
This needs introducing a constraint which guarantees uniqueness of (topic_id, tag) pair, otherwise you could count your magic 4 just because the article is tagged with 'cricket' four times.
skalee
@skalee, HAVING COUNT(DISTINCT topicId, tag) = 4 instead of Having COUNT(*) helps a lot in this case.
Michael Pakhantsov
that's true, although having unique constraint may be useful for other reasons
skalee
A: 

a AND b AND c AND d:

SELECT t1.topic_id
FROM tags_table AS t1
INNER JOIN tags_table AS t2
ON t2.topic_id = t1.topic_id AND t2.tag = 'b'
INNER JOIN tags_table AS t3
ON t3.topic_id = t1.topic_id AND t3.tag = 'c'
INNER JOIN tags_table AS t4
ON t4.topic_id = t1.topic_id AND t4.tag = 'd'
WHERE t1.tag = 'a'

Unfortunately, OR condition is harder. Full outer join would be handy, but MySQL is missing that feature.

I suggest ensuring that you have no ORs inside parentheses (not (a OR b) AND c, rather (a AND c) OR (b AND c) and doing query like this:

a OR b OR c OR (some and clause like d AND e):

SELECT DISTINCT topic_id FROM (
  SELECT topic_id FROM tags_table where tag = 'a'
  UNION ALL
  SELECT topic_id FROM tags_table where tag = 'b'
  UNION ALL
  SELECT topic_id FROM tags_table where tag = 'c'
  UNION ALL
  query_like_the_previous_one_represinting_some_AND_clause
) as union_table

In db software other than MySQL you could utilize query probably (I have no means to test it right now) like this one:

SELECT COALESCE(t1.topic_id, t2.topic_id, t3.topic_id, ...)
FROM tags_table AS t1
INNER JOIN tags_table AS t2
ON t2.topic_id = t1.topic_id AND t2.tag = 'b'
FULL OUTER JOIN tags_table AS t3
ON t3.topic_id = t1.topic_id AND t3.tag = 'c'
INNER JOIN tags_table AS t4
ON t4.topic_id = t1.topic_id AND t4.tag = 'd'
WHERE t1.tag = 'a'

which I believe should represent (a AND b) OR (c AND d). Note COALESCE, because of full outer join t1.topic_id could be null.

skalee
A: 

This is a Rails solution that creates self-referencing joins for the AND case and a simple SQL include for the OR case. The solution assumes a Model called TopicTag and consequently a table called topic_tags.

The class method Search expects 2 arguments an array of Tags and a string containing either "and" or "or"

class TopicTag < ActiveRecord::Base

  def self.search(tags, andor)

    # Ensure tags are unique or you will get duplicate table names in the SQL
    tags.uniq!

    if andor.downcase == "and"
      first = true
      sql = ""

      tags.each do |tag|
        if first
          sql = "SELECT DISTINCT topic_tags.topic_id FROM topic_tags "
          first = false
        else
          sql += " JOIN topic_tags as tag_#{tag} ON tag_#{tag}.topic_id = \
                   topic_tags.topic_id AND tag_#{tag}.tag = '#{tag}'"
        end
      end
      sql += " WHERE topic_tags.tag = '#{tags[0]}'"
      TopicTag.find_by_sql(sql)

    else
      TopicTag.find(:all, :select => 'DISTINCT topic_id', 
          :conditions => { :tag => tags})
    end
  end

end

In order to get some more test coverage the data was extended to include an extra record for chess. The Database was seeded with the following code

[1,2].each   {|i| TopicTag.create(:topic_id => i, :tag => 'football')}
[1,3].each   {|i| TopicTag.create(:topic_id => i, :tag => 'cricket')}
[2,3,4].each {|i| TopicTag.create(:topic_id => i, :tag => 'basketball')}
[4,5].each   {|i| TopicTag.create(:topic_id => i, :tag => 'chess')}

The following test code produced the results shown

tests = [
  %w[football cricket],
  %w[chess],
  %w[chess cricket basketball]
]

tests.each do |test|
  %w[and or].each do |op|
    puts test.join(" #{op} ") + " = " + 
      (TopicTag.search(test, op).map(&:topic_id)).join(', ')
  end
end
football and cricket = 1
football or cricket = 1, 2, 3
chess = 4, 5
chess = 4, 5
chess and cricket and basketball = 
chess or cricket or basketball = 1, 2, 3, 4, 5

Tested on Rails 2.3.8 using SqlLite

EDIT

If you wish to use like then the OR case also gets slightly more complex. You should also be aware that using LIKE with a leading '%' could have significant impacts on performance if the table you are searching is of a non-trivial size.

The following version of the Model uses LIKE for both cases.

class TopicTag < ActiveRecord::Base

  def self.search(tags, andor)

    tags.uniq!

    if andor.downcase == "and"
      first = true
      first_name = ""
      sql = ""

      tags.each do |tag|
        if first
          sql = "SELECT DISTINCT topic_tags.topic_id FROM topic_tags "
          first = false
        else
          sql += " JOIN topic_tags as tag_#{tag} ON tag_#{tag}.topic_id = \    
                  topic_tags.topic_id AND tag_#{tag}.tag like '%#{tag}%'"
        end
      end
      sql += " WHERE topic_tags.tag like '%#{tags[0]}%'"
      TopicTag.find_by_sql(sql)

    else
      first = true
      tag_sql = ""
      tags.each do |tag| 
        if first
          tag_sql = " tag like '%#{tag}%'" 
          first = false
        else
          tag_sql += " OR tag like '%#{tag}%'" 
        end
      end
      TopicTag.find(:all, :select => 'DISTINCT topic_id', 
            :conditions => tag_sql)
    end
  end

end

tests = [
  %w[football cricket],
  %w[chess],
  %w[chess cricket basketball],
  %w[chess ll],
  %w[ll]
]

tests.each do |test|
  %w[and or].each do |op|
    result = TopicTag.search(test, op).map(&:topic_id)
    puts ( test.size == 1 ? "#{test}(#{op})" : test.join(" #{op} ") ) + 
         " = " + result.join(', ')
  end
end
football and cricket = 1
football or cricket = 1, 2, 3
chess(and) = 4, 5
chess(or) = 4, 5
chess and cricket and basketball = 
chess or cricket or basketball = 1, 2, 3, 4, 5
chess and ll = 4
chess or ll = 1, 2, 3, 4, 5
ll(and) = 1, 2, 3, 4
ll(or) = 1, 2, 3, 4
Steve Weet
I've just realised you want to use Like on your tags. I'll revisit this
Steve Weet
Edited to use LIKE
Steve Weet