tags:

views:

239

answers:

5

For quite a simple table structure, ie. Person, Criteria, and PersonCriteria (the combi-table), I have set up a query at the moment that selects all persons that possess all selected criteria.

The query itself looks like this at the moment:

SELECT 
  p.PersonID   
FROM 
  Person p,     
  ( SELECT DISTINCT PersonID, CriteriaID 
    FROM PersonCriteria 
    WHERE CriteriaID in (#list_of_ids#)     
  ) k     
WHERE 
  p.PersonID= k.PersonID     
GROUP BY 
  p.PersonID     
HAVING 
  Count(*) = #Listlength of list_of_ids#

So far no problem and everything works fine.

Now I want to offer the possibility for the user to add some AND and OR variables in their search, ie. someone could say:

I'm looking for a person that possesses: Criteria 1 AND 3 AND 4 (which would be covered by the query above) AND (5 OR 6 OR 7) AND (8 OR 9) and so on...

I'm not sure where to start with this additional level. I hope someone else does..:-)

+1  A: 

You can simplify this a lot, for instance by doing:

SELECT DISTINCT PersonID FROM PersonCriteria WHERE CriteriaID IN (1,2) OR CriteriaID IN (8,9)

Also consider using JOIN's instead of sub-selects (for performance)

thomask
I think you missed the point. Re-read and think it through carefully...
Vilx-
+2  A: 

I have to say - I'm stumped. I cannot think of any solution that would come even close. I would try looking for a solution in these directions:

  • User-defined aggregate functions. Maybe you can make a function that takes as an argument the desired expression (in a simplified syntax) and the rows for a single person. The function then parses the expression and matches it against the rows. Hmm... maybe MySQL includes some concatenating aggregate function and a regex matching function? This might be a solution then (though probably not a very fast one).
  • Analytic functions. I don't pretend that I understand them, but as much as I know about them, I think they are generally in this direction. Although I don't know if there will be a function that will suit this need.

Added: Ahh, I think I got it! Although I think the performance will be miserable. But this will work! For example, if you have the requirement to search for 1 AND 2 AND (3 OR 4) then you would write:

SELECT
    *
FROM
    Persons A
WHERE
    EXISTS (Select * from PersonCriteria B WHERE A.PersonID=B.PersonID AND CriteriaID=1)
    AND
    EXISTS (Select * from PersonCriteria B WHERE A.PersonID=B.PersonID AND CriteriaID=2)
    AND
    (
        EXISTS (Select * from PersonCriteria B WHERE A.PersonID=B.PersonID AND CriteriaID=3)
        OR
        EXISTS (Select * from PersonCriteria B WHERE A.PersonID=B.PersonID AND CriteriaID=4)
    )

Added 2: Here's another one, though the performance will likely be even worse:

SELECT p.* FROM Person p
    JOIN (select PersonID from PersonCriteria WHERE CriteriaID=1) c1 ON p.PersonID=c1.PersonID
    JOIN (select PersonID from PersonCriteria WHERE CriteriaID=2) c2 ON p.PersonID=c2.PersonID
    JOIN (select PersonID from PersonCriteria WHERE CriteriaID IN (3,4)) c3 ON p.PersonID=c3.PersonID

Added 3: This is a variation of No. 2, but this might actually have a chance of a decent performance!

SELECT p.* FROM
    Person p
    JOIN PersonCriteria c1 on (p.PersonID=c1.PersonID AND c1.CriteriaID=1)
    JOIN PersonCriteria c2 on (p.PersonID=c2.PersonID AND c2.CriteriaID=2)
    JOIN PersonCriteria c3 on (p.PersonID=c3.PersonID AND c3.CriteriaID IN (3,4))

If you add an index to PersonCriteria on columns (PersonID,CriteriaID) (exactly in this order!), then I think it's about as fast as you're going to get in any case.

Vilx-
Your number "3" does the trick nicely. I had to add Distinct to the select, and the perfomance doesn't seem to be too bad at all.Thanks!
Bart B
Yeah, I forgot the DISTINCT part. :)
Vilx-
+1  A: 

I I understand what you are asking this should work. I make no guarantees that I understand what you are asking for though as clearly several people have had differnt intepretations already.

SELECT   p.PersonID   
FROM   Person p
JOIN       
(SELECT DISTINCT PersonID    
FROM PersonCriteria     
WHERE CriteriaID in (1,2,3) and count(criteriaID) = 3) k  
       on p.PersonID =    k.PersonID  
JOIN
   (SELECT DISTINCT PersonID    
FROM PersonCriteria     
WHERE CriteriaID in (4,5) ) k2  on p.PersonID = k2.PersonID
JOIN
   (SELECT DISTINCT PersonID    
FROM PersonCriteria     
WHERE CriteriaID in (5,6,7) ) k3  on p.PersonID = k3.PersonID
JOIN
   (SELECT DISTINCT PersonID    
FROM PersonCriteria     
WHERE CriteriaID in (8,9) ) k4  on p.PersonID = k4.PersonID

The way I am interpreting this. the first join is to a derived table that gets anyone who has all three of the specified conditions. THen the succeeding derived tables find the people who meeting one of those conditions (basically the OR part) by joining to the rest of the derived tables, we accomplish the AND part of the query. I also know the syntax passes the syntax check for SQL Server, it might need tweaking for MYSQL.

HLGEM
A: 

If you need to offer a more "dynamic" approach to search your data, the SQL will get really ugly and long and will not really be completely dynamic and... did I mention "ugly"?

I use ORM frameworks for the task, and they handle the work just great.

But if your structure is only as you described (with OR wrapping many AND conditions), then assuming that your current AND-only implementation is in the a UDF called dbo.getPersonForAndCriteria(...), you could implement OR simply using UNION:

dbo.getPersonForAndCriteria(@myListOfIDs1) --// works for AND
UNION -- replaces OR
dbo.getPersonForAndCriteria(@myListOfIDs2) --// works for AND
UNION -- replaces OR
dbo.getPersonForAndCriteria(@myListOfIDs3) --// works for AND

Note: this is illustrative only, but I would wrap your procedure into a nice UDF that takes the list of parameters (IDs) as a table (using XML or comma-delimited string, which is then parsed inside the UDF), then just make a nice JOIN on this list/table instead of WHERE ... IN part, and the last part becomes COUNT(*) = COUNT(SELECT ID FROM myFilterTable).

van
A: 

If I understand correctly, you just need to have dynamic query. in the example below @Criteria1 and @Criteria2 are parameters pass from outside and you can build any criteria as you need. so just build all the sql string before execute

DECLARE @sql NVARCHAR(MAX)

DECLARE @Criteria NVARCHAR(MAX)

DECLARE @Criteria1 NVARCHAR(MAX)

DECLARE @Criteria2 NVARCHAR(max)

SET @Criteria1 = '1,2,3,4'

SET @Criteria2='5'

SET @Criteria = 'CriteriaID in (' + @Criteria1 +')'

SET @Criteria = @Criteria + ' and CriteriaID=' + @Criteria2

SET @sql ='SELECT p.PersonID FROM Person p,( SELECT DISTINCT PersonID, CriteriaID FROM PersonCriteria WHERE '

SET @sql = @sql + @Criteria

SET @sql = @sql + ' ) k WHERE p.PersonID= k.PersonID GROUP BY p.PersonID HAVING Count(*) = #Listlength of list_of_ids#'

EXEC sp_executesql @sql

pang