views:

86

answers:

5

So imagine that you have a table of Products (ID int, Name nvarchar(200)), and two other tables, ProductsCategories (ProductID int, CategoryID int) and InvoiceProducts (InvoiceID int, ProductID int).

I need to write a query to produce a set of products that match a given set of invoice ids and category ids such that the list of products match all the specified categories and all the specified invoices, without falling back to dynamic SQL. Imagine I need to find a list of products that are in both categories 1 and 2 and in invoices 3 and 4.

As a start, I've written a stored-procedure that accept the category ids and invoice ids as strings, and parse them into tables:

 CREATE PROCEDURE dbo.SearchProducts (@categories varchar(max), @invoices varchar(max))
 AS BEGIN
      with catids as (select cast([value] as int) from dbo.split(@categories, ' ')),
           invoiceids as (select cast([value] as int) from dbo.split(@invoices, ' '))
           select * from products --- insert awesomeness here
 END

The different solutions I've come up with look awful, and perform worse. The best thing I've found is to generate a view comprised of left joins of all the criteria, but that seems very expensive and doesn't solve the issue of matching all of the different keys specified.


Update: This is an example query I wrote that yields the expected results. Am I missing any optimization opportunities? Like magical unicorn matrix operations by ninjas?

with catids as (select distinct cast([value] as int) [value] from dbo.split(@categories, ' ')),
  invoiceids as (select distinct cast([value] as int) [value] from dbo.split(@invoices, ' '))

  select pc.ProductID from ProductsCategories pc (nolock)
    inner join catids c on c.value = pc.CategoryID 
    group by pc.ProductID 
    having COUNT(*) = (select COUNT(*) from catids)  
  intersect
  select ip.ProductID from InvoiceProducts ip (nolock)
    inner join invoiceids i on i.value = ip.InvoiceID 
    group by ip.ProductID 
    having COUNT(*) = (select COUNT(*) from invoiceids)   
A: 

Pass them as XML parameter, store them to a temp table and join.

anivas
A: 

I'd start with something like this, utilizing your tabled ID values from the parameters. Temp tables can help with subquery speed.

select p.*
from
(
    select pc.*
    from catids c
    inner join ProductsCategories pc
        on pc.CategoryID = c.value
) catMatch
inner join
(
    select pin.*
    from invoiceids i
    inner join ProductsInvoices pin
        on pin.InvoiceID = i.value
) invMatch
    on invMatch.ProductID = catMatch.ProductID
inner join Products p
    on p.ID = invMatch.ProductID
ulty4life
A: 

How about a recursive CTE?

First add row numbers to the criteria tables, then some pseudo SQL if you will:

;WITH cte AS(
Base case: Select productid, criteria from products left join criteria where row_number = 1 if it matches criteria from both row 1s or one is null.
UNION ALL
Recursive case: Select n+1 criteria row from products left join criteria where row_number = cte.row_number + 1 AND matches criteria from both row_number + 1 or one or the other (but not both) is null
)
SELECT *
WHERE criteria = maximum id from criteria table.

This will give you a way of performing AND on multiple criteria, and should perform well.

Does this make any sense at all? I've done some pretty cool fast stuff with CTEs lately, and can elaborate if necessary.

Removed cte code because it was wrong, and not really worth fixing having a much better solution out there.

mootinator
I've only recently discovered recursive CTE's. I was having trouble getting this to work though. It complained about the recursive section containing left joins.
Andy Edinborough
Right, I've even run into that error before. A way around that might just be to use a cte for each criteria instead of trying to cram them together uncomfortably like I've done here.
mootinator
A: 

ProductCategories should have a clustered index on (CategoryId, ProductId) and InvoiceProducts should have one on (InvoiceId, ProductId) optimally. This will allow finding product ids given the CategoryId and InvoiceId by using the data in the clustered indexes only.

You could use a function to return a table of ints given a string. Google "CsvToInt" and click on the first link from SqlTeam to see the code.

Then you could:

SELECT *
FROM Products
WHERE ID IN (SELECT DISTINCT ProductId 
        FROM ProductCategories
        WHERE CategoryId in dbo.CsvToInt(@categories)
    ) AND ID IN (SELECT DISTINCT ProductId 
        FROM InvoiceProducts
        WHERE InvoiceId in dbo.CsvToInt(@invoices)
    )
Jason Goemaat
My situation is more complicated than that unfortunately. If I pass in 2 categories, this will give products that are in one of the two categories. I need it to tell me products that are in both. I had written a modified version of this query that grouped by the product ID and returned ID's that had a count matching the number of categories I passed in, but it seems like it would be poor in performance.
Andy Edinborough
+1  A: 

Provided that you have unique indices on both (ProductID, CategoryID) and (ProductID, InvoiceID):

SELECT  ProductID
FROM    (
        SELECT  ProductID
        FROM    ProductInvoice
        WHERE   InvoiceID IN (1, 2)
        UNION ALL
        SELECT  ProductID
        FROM    ProductCategory pc
        WHERE   CategoryID IN (3, 4)
        ) q
GROUP BY
        ProductID
HAVING  COUNT(*) = 4

or, if your values are passed in CSV strings:

WITH    catids(value) AS
        (
        SELECT  DISTINCT CAST([value] AS INT)
        FROM    dbo.split(@categories, ' '))
        ), 
        (
        SELECT  DISTINCT CAST([value] AS INT)
        FROM    dbo.split(@invoices, ' '))
        )
SELECT  ProductID
FROM    (
        SELECT  ProductID
        FROM    ProductInvoice
        WHERE   InvoiceID IN
                (
                SELECT  value
                FROM    invoiceids
                )
        UNION ALL
        SELECT  ProductID
        FROM    ProductCategory pc
        WHERE   CategoryID IN
                (
                SELECT  value
                FROM    catids
                )
        ) q
GROUP BY
        ProductID
HAVING  COUNT(*) = 
        (
        SELECT  COUNT(*)
        FROM    catids
        ) + 
        (
        SELECT  COUNT(*)
        FROM    invoiceids
        )

Note that in SQL Server 2008 you can pass table-valued parameters into the stored procedures.

Quassnoi
+1 for seeing that a join to products isn't necessary and for grouping over the union of matches. I think this is my answer, but I was hoping there were operators I hadn't considered. Do you know of a way to introduce my criteria of category and invoice ids without joins? Should the subquery be moved into a cte?
Andy Edinborough
@Andy: which criteria? My query does not contain any joins at all
Quassnoi
This query will be part of a stored procedure, so the only I know to pass in the list of category and invoice ids is as `varchar` and then splitting them into cte's, which would have to be joined on the other tables.
Andy Edinborough
@Andy: Which version of `SQL Server` are you using? In `SQL Server 2008`, you can pass table variables right from the client.
Quassnoi
Oh, I didn't know that. I'm on 2008 R2. I'll try that out. Do I need to use a join for them? Or is there a way to use the `in` operator (not as a subquery: `where id in (select id from table)`)?
Andy Edinborough
@Andy: you just get rid of the `CTEs` in the query and replace their references with `@catids` and `@invoiceids`, where `@catids` and `@invoiceids` are names of the table variables you pass from the client. You use table variables exactly as if you were using tables, so that would be correct: `WHERE in IN (SELECT id FROM @catids)`
Quassnoi
Yeah, I was just curious if there was an alternate syntax that I wasn't aware of.
Andy Edinborough